STATISTICS 
IN EDUCATION 
AND PSYCHOLOGY 
155 A First Course 
| 
) 
F MERLE W, TATE 


— ` A 


Statistics n 


Education and Psychology 


A FIRST COURSE 


The Macmillan Company - New york 


Collier-Macmillan Limited - LONDON 


Statistics m 
Education and 


Psychology 


A FIRST COURSE 


í AS m 
i i omy Research ! 

n t san 8 : 
: 33 0) E 


; Baha) /3. /2. <. PUER 


O Copyright, Merle W. Tate, 1965 


All rights reserved. No part of this book 
may be reproduced or utilized in any form 
or by any means, electronic or mechanical, 
including photocopying, recording or by 
any information storage and retrieval 
System, without permission in writing from 
the Publisher. 


FIRST PRINTING 


Earlier edition, entitled, Statistics in Education, 
© copyright 1955 by The Macmillan C. ompany. 


Library of Congress catalog card number: 65-20636 


THE MACMILLAN COMPANY, NEW YORK 
COLLIER-MACMILLAN CANADA, LTD., TORONTO, ONTARIO 


PRINTED IN THE UNITED STATES OF AMERICA 


Preface 


Although presented under a different title, this book is in part an 
abridgment and in part a revision of the author's Statistics in Education, 
published in 1955. 

The sections of the earlier book dealing with approximate numbers 
and computation, tables and graphs, geometric and harmonic means, 
the coefficient of variation, multiple regression and correlation beyond 
the three-variable case, the binomial and F sampling distributions, 
and the derivation of formulas have been deleted. Various other 
sections have been shortened. 

The revision has resulted in reorganization of Chapters 4, 6, and 8 of 
the earlier book. Transformations of scores and the normal, 7, and 
chi-square sampling distributions are each treated in separate chapters. 
The principal changes in notation include use of s, a, and s’ for the 
sample, population, and estimated population standard deviations, 
respectively; $ for a population proportion; pry for a population 
product-moment coefficient of correlation; and z for a standard score 
in a normal sampling distribution. Principal additions include the 
Wilcoxon, Kruskal-Wallis, and Friedman rank tests and more extended 
treatment of confidence intervals. The former seems desirable because 
the rank tests are intuitively simple, widely useful, and nearly as power- 
ful as their parametric analogues; the latter, because confidence inter- 
vals, in addition to separating admissible from inadmissible hypotheses, 
provide a relatively simple way of judging the importance of a significant 
difference and the risk of the Type II error in a nonsignificant difference. 

The present book is intended primarily as a text for a one-semester 
course for students in education and psychology. lt assumes no mathe- 
matics beyond high school algebra; the approach to statistical ideas is 
through intuition and arithmetic. The use of symbolic language is 


v 


vi / Preface 


minimized. The chief aims of the text, exercises, and answers to exer- 
cises are to emphasize the assumptions underlying statistical methods, 
the interpretation of statistics, and sound statistical thinking. 

I am indebted to the Literary Executor of the late Sir Ronald S. 
Fisher, F. R. S., Cambridge, to Dr. Frank Yates, F. R. S., Rothamsted, 
and to Oliver & Boyd Ltd., Edinburgh, for their permission to reprint 
Tables E and F from their books Statistical Methods for Research 
Workers and Statistical Tables for Biological, Agricultural, and Medical 
Research. I am also indebted to Catherine Thompson and to E. S. 
Pearson, Editor of Biometrika, for Table G, which is an abridgment 
of a table originally published in Biometrika. My indebtedness to other 
authors and publishers for permission to use various materials is 
acknowledged at appropriate places in later pages. 

I wish to thank Professors Sara M. Brown, Berj Harootunian, 
Fred W. Ohnmacht, and Ralph C. Preston for advice about the revision 
and for reading parts of the manuscript. 1 wish also to thank Jane 
Amsterdam who helped with the exercises and answers. 


MERLE W. TATE 


Philadelphia 


Contents 


PREFACE 


I Introduction 
Statistics as a Tool in Research 
Statistical Data 
The Study of Statistics 
Exercises 
II Organization and Presentation of Statistical Data 
The Frequency Distribution 
Graphical Presentation of the Frequency Distribution 
Types of Frequency Distributions 
Exercises 
III Characteristics of Statistical Series. Central Tendency 


Measures of Central Tendency 

The Mode 

The Median 

The Arithmetic Mean 

Interpretation and Use of Measures of Central Tendency 


Exercises 


vii 


viii / Contents 


IV Characteristics of Statistical Series. Variability 63 
The Range 65 
Interpercentile Measures 65 
The Average Deviation 69 
The Standard Deviation 71 
Interpretation and Use of Measures of Variability 80 
Skewness and Kurtosis 82 
Exercises 88 

V Transformations of Scores 91 
Percentile Ranks 91 
Standard Scores 99 
Exercises 103 

VI The Normal Curve 106 
The Normal Curve as a Limiting Form 106 
Áreas and Frequencies Under the Normal Curve 107 

Uses of the Normal Curve 114 
Exercises 123 
VII Correlation and Regression 126 
Statistical Correlation 127 
Correlation in the Social Sciences 127 
The Product-Moment Coefficient of Correlation 129 
The Meaning of Correlation 139 
Linear Regression 142 
The Regression Equation in Prediction 146 
Analysis and Interpretation of Relationship 153 
Correlation and Causation 160 
Special Applications of Product-Moment Correlation 161 
Relationships Among Three or More Variables 167 
Exercises 176 
VIII Reliability and Validity of Statistical Evidence 181 
The Conditions of Trustworthy Evidence 182 
Relations Between Errors of Measurement and Reliability 188 
Interpretation and Use of Estimates of Reliability 196 
Test Item Analysis 204 
Exercises 212 
IX Statistical Inference 216 
Sampling Theory and Statistical Inference 217 


Exercises 233 


Contents / ix 


X The Normal Sampling Distribution 235 
Inference from Single Samples 236 
Assumptions Underlying the Normal Sampling Distribution 251 
Inferences from Two Independent Samples 255 
Inferences from Two Related Samples 263 
Exercises 271 

XI The t Sampling Distribution 274 
Degrees of Freedom 275 
Areas Under t Curves 278 
Inferences from Single Samples 278 
Inferences from Two Samples 283 
Exercises 287 

XII The X? Sampling Distribution 289 
The Distribution of X? 290 
Applications to Frequency Data 293 
Applications to Ranked Data 303 
Exercises 307 
REFERENCES 309 
APPENDIX 313 
TABLES A-J 314 
ANSWERS TO SELECTED EXERCISES 340 


INDEX 349 


š 
` ' 
f 
i »# 
` 
m 
; 
; ; 


Statistics in 


Education and Psychology 


A FIRST COURSE 


CHAPTER I 


Introduction 


Statistics as a Tool in Research 


The search for knowledge needed in solving problems is universal 
and endless. Outside of textbooks, few problems admit of only one 
solution. Since different kinds of knowledge may be obtained, it is 
typically the case that several solutions can be found for a given prob- 
lem. The question of how a particular solution is reached is an impor- 
tant one, for it relates to that most persistent and complex question, 


"How do we know?" 


SOURCES OF KNOWLEDGE 


Generally speaking, we attempt to obtain the knowledge needed 
in solving a particular problem from one or more of four sources: 
(1) authority: expert testimony, opinions of specialists; (2) inertia: 
habit, custom, tradition; (3) intuition: self-evident propositions, in- 
disputable premises, obvious truths; and (4) evidence: matters of fact.* 
Although the fourth source has come to have a prestige denied the 
other three, each of the three is, at one time or another, of value. 


* There is unfortunately no simple definition of fact. As used here, fact refers 
to something known, or capable of verification, directly through experience. Facts 
are here considered to be the stuff of evidence. They are not true or false, they just 
are, and constitute the criteria which make statements true or false. 


2/ Statistics in Education and Psychology 


Appeals to custom are useful in deciding questions relating to social 
amenities; appeals to authority or to intuition may be invaluable in 
dealing with a novel problem or one about which little is known. There 
are a great many problems in school and society for which little trust- 
worthy evidence is available; there are many others whose solutions, 
despite a wealth of facts, turn on imponderables. 

Consider a group of college officials who are attempting to decide 
whether to adopt some proposed change in admissions procedures, 
programs of study, or grading practices. The officials may resort to 
authority and solicit the opinions of specialists regarding the proposed 
change. Custom and tradition may operate in either of two ways: 
The officials may decide to “let well enough alone” and to go on doing 
what has been done in the past, or they may survey comparable colleges 
and decide on the practice which is the most popular or customary. 
As a third alternative, the officials may make certain assumptions 
about the needs of students and society and attempt to reach a decision 
by “if-then” argument, i.e., by logical deductions from the assumptions. 
If this is done, the decision will, of course, depend on the assumptions 
that are made. 

As a fourth alternative, the officials may gather evidence relating to 
the results of the proposed change—as these have been observed in 
other colleges or are observed through experimentation—and make 
their decision in accordance with the evidence, with the expectation 
that the same results will obtain in the future. It is quite likely that the 
evidence will be inconclusive in one or more respects, and that con- 
sequently the final decision will have to be reached by weighing various 
considerations. 

Most of our social problems are enormously complex, and it 
would be wrong to suppose that given the available facts they can be 
solved. But it would be equally wrong to believe that workable solu- 
tions can be found without facts or in opposition to facts. We ordinarily 
expect expert testimony, at least in temporal matters, to be based upon 
matters of fact, if such are available. Custom and tradition give way, 
although slowly, when contradicted by the facts. The "reasonable" 
assumptions and the “indisputable” premises of convenient syllogisms 
are constantly subject to scrutiny in the light of facts. 

When evidence is available or can be obtained, it is the mark of 
wisdom to use it. As a matter of common observation, factually 
Supported solutions tend to be more widely acceptable, convincing, 
and more successful than any other kind. Moreover, whatever the 


Introduction / 3 


method by which a particular problem is solved, the merit of the solution 
ordinarily is judged by its observable consequences. 


RESEARCH AND STATISTICS 


The search for factual solutions to problems commonly is called 
research. More formally, research is the systematic collection, analysis, 
and interpretation of facts relating to a specific problem. Statistics, as 
a tool in research, deals with methods of collecting and interpreting 
numerical facts. A great many of the social problems which invite 
research are related to the measurement of individual differences, the 
results of different treatments on individuals and groups, and the 
organization and administration of social agencies. Problems of the 
first two classes usually involve measuring instruments, such as tests 
and inventories, and correlational and experimental procedures; those 
of the third class, records of revenues and expenditures, recorded 
characteristics of people and events, and the effects of different practices 
in the past. When the evidence collected by testing, experimentation, 
or record-keeping is stated as numerical facts, as it usually is, statistical 
methods are essential to analysis and interpretation. 

Some social scientists are suspicious of figures. They believe that 
presenting evidence relating to man and his affairs in the form of 
numerical facts necessarily devitalizes and distorts the phenomenon 
which is under investigation. This amounts to the belief that social 
problems must be dealt with mainly on authoritative, traditional, or 
intuitive grounds. Reliable evidence usually can be quantified, and 
numerical facts are the results of quantification. It does not follow, of 
course, that quantification makes evidence reliable. 

The answers to criticism of statistical method in research usually 
reduce to the simple and practical one that it is demonstrably the most 
successful tool we have in dealing with numerous problems. If the 
alternative were recourse to oracles or logical deduction from im- 
peccable premises, we could dispense with statistics—and with other 
research tools, for that matter. But we can attack few of our problems 
armed only with a priori wisdom and logic. If we are to deal with many 
of our problems on reliable grounds, it is necessary to employ statistics. 

Let us note in passing that statistics is extremely broad in applica- 
tion, the social sciences being only one of the fields which it serves. It 
is the tool par excellence for dealing quantitatively with phenomena in 
any field that are too complex for precisely controlled experimentation 
and too irregular for exact mathematical analysis. It is widely used 


4 / Statistics in Education and Psychology 


in agriculture, anthropology, biology, and medicine and, to an in- 
creasing extent, in the more exact sciences. An understanding of statisti- 
cal method is an aid to understanding the developments in many fields 
of study, and consequently makes an important contribution to the 
general education of the student. 


MEANING AND ORIGIN OF STATISTICS 


The word sratistics has three common meanings. In its oldest 
sense, it referred to any sort of facts, numerical or otherwise, which 
reflected the “conditions and prospects" of society or state. However, 
the meaning of statistics in this sense has been narrowed, so that today, 
when the word is used to mean facts characterizing society and the 
physical environment, numerical facts exclusively are implied. This 


meaning is well illustrated in the following passage from Johnson 
(Ref. 23, p.1): 


Our entrance into and departure from this world are recorded as sta- 
tistical events. Birth and death, marriage and divorce, the school at- 
tendance of our children, the crops grown by farmers, the number of 
miles flown by commercial planes, the hours of our labor, the output of 
manufacturing plants, the acres of wood demanded for paper, the hours 
of sunshine, the inches of snowfall—all such events and activities are 
recorded somehow and somewhere. Myriads of such experiences and 
events affecting the daily lives of roundly two billion human beings 


lie behind the statistical data condensed in volumes, published and 
unpublished. 


During the latter part of the nineteenth century, the word statistics 
acquired a second meaning. It came to refer to the theories and tech- 
niques involved in collecting, summarizing, and interpreting numerical 
facts, as well as to the facts themselves, i.e., statistics came to mean 
method or methods of dealing with numerical facts. (Ordinarily, when 
used to imply methodology, statistics is singular and takes the singular 
Verb.) Statistical method originated in the calculation of insurance rates 
on ships, in the study of the operation of chance in games and human 
affairs, and in the investigation of errors of observation in astronomy. 
Statistical method was applied to the social sciences by Quetelet (1796— 
1874) in Belgium and Galton (1822-1911) in England, both of whom 
saw in the method a quantitative and powerful tool for dealing with 
the mass data characterizing man and society. Primarily owing to the 
work and influence of Karl Pearson (1857-1936) and R. A. Fisher 
(1890-1962), English scientists and mathematicians, both theoretical 
and applied statistics expanded rapidly during the first third of the 


Introduction / 5 


twentieth century. At present, as has been noted, the method of 
statistics is widely used in various fields of research. 

In recent years, a third meaning has been given to statistics. When 
numerical facts are reduced to summary figures, such as averages, 
ranges, and percentages, the derived figures are frequently referred to 
as statistics and a single one as a statistic. In this sense, the arithmetic 
mean of a set of numerical data is a statistic. 

The three meanings of statistics are brought out rather well in a 
student's jest, "It's all perfectly clear; you compute statistics from 
statistics by statistics." The three uses of the term rarely cause real 
confusion, however, since the particular meaning is usually quite clear 
in context. 


THE NECESSITY OF STATISTICS IN THE REDUCTION OF DATA 


As a tool in research, statistical method renders two invaluable 
services. The first is that of enabling us to classify, organize, and sum- 
marize numerical facts so that they can be more readily comprehended 
and interpreted. Consider Table A, Appendix. As listed, the mass of 
information about the college freshmen is difficult to interpret. Various 
questions, such as how the freshmen compare with previous classes and 
with freshmen in other colleges and how those from public secondary 
schools compare with those from private schools, cannot be studied 
without first reducing the mass of data into more compact form. The 
information must be classified and summarized before the mind can 
comprehend its salient features. In further illustration, suppose a 
weather bureau has faithfully observed hourly temperatures during the 
past ten years. The bureau would have 24 X 365 X 10 temperature 
readings, and unless some sort of reduction and summarization scheme 
were used, the very thoroughness of the observations would make them 


hopelessly complicated. 
In this connection, Fisher points out: 


... Any investigator who has carried out methodical and extensive ob- 
servations will probably be familiar with the oppressive necessity of 
reducing his results to a more convenient bulk. No human mind is 
capable of grasping in its entirety the meaning of any considerable quan- 
tity of numerical data. We want to be able to express all the relevant 
information contained in the mass by means of comparatively few nu- 
merical values....In all cases, perhaps, it is possible to reduce to a 
Simple numerical form the main issues which the investigator has in view, 
in so far as the data are competent to throw light on such issues. The 
number of independent facts supplied by the data is usually far greater 


6 / Statistics in Education and Psychology 


than the number of facts sought, and in consequence much of the in- 
formation supplied by any body of actual data is irrelevant. It is the 
object of the statistical processes employed in the reduction of data to 
exclude this irrelevant information, and to isolate the whole of the rele- 
vant information contained in the data.* 


It is generally the case that a mass of numerical data is useful to 
the extent to which it can be summarized in tables or graphs and simply 
described in terms of frequencies, averages, variabilities, and relation- 
Ships. It may be said that the first use of statistics in research is to 
“distill” raw data. The student need only consider the United States 
Census to convince himself that this service is indispensable. 


THE NECESSITY OF STATISTICS IN INFERENCE 


The second invaluable service rendered by statistics is that of en- 
abling us to draw conclusions, of a statable degree of exactness, about 
the probable nature of objects and events upon less than complete 
evidence. To quote from Fisher again: 


... From a limited experience, for example, of individuals of a species, 
or of the weather of a locality, we may obtain some idea of the infinite 
hypothetical population from which our sample is drawn, and so of the 
probable nature of future samples to which our conclusions are to be 
applied. If a second sample belies this expectation we infer that it is, 
in the language of statistics, drawn from a different population; that the 
treatment to which the second sample of organisms had been exposed 
did in fact make a material difference, or that the climate (or the methods 
of measuring it) had materially altered.j 


Most of the knowledge we derive from the matters of fact relating 
to some issue is of probable rather than of certain nature, because only 
a limited number or sample of the facts is available. The conclusion 
that all men are mortal, for example, is not based upon the study of the 
life histories of all men. Men are living today and men are yet to be 
born. Rather, the conclusion is based upon the life histories of certain 
men who have been observed to be mortal and is really an inference we 
make from a sample. Similarly, knowledge based upon experiment 
that one method of teaching, say algebra, is superior to a second method 
is only an inference drawn from a sample. The experiment might be 
repeated over and over again, both at present and in the future. These 
are illustrations of situations in which we cannot possibly examine all 


^ Reprinted from R. A. Fisher, Statistical Methods for Research Workers, 
published 1950 by Oliver and Boyd, Ltd., Edinburgh, by permission, p. 6. 
TR. A. Fisher, op. cit., p. 41. 


Introduction / 7 
of the facts relevant to a particular conclusion, for the simple reason 
that they are by nature endless and cannot be made available. In such 
situations, the total number or population of pertinent facts is con- 
sidered to constitute an infinite hypothetical population. Any investiga- 
tion of infinite populations necessarily is limited to samples. 

Some statistical populations, though finite, are so vast as to be 
practically inaccessible. In determining, say, the average price of staple 
groceries to the consumer, it is practically impossible to observe prices 
in all of the retail outlets even in a single city. In the various public 
opinion polls it would be practically impossible to poll all of the mem- 
bers in the population about which it is desired to draw conclusions. 
In these situations, samples as large as time and money permit are 
selected and studied to determine the probable character of the 
population. 

In other situations, sampling may be necessary because the drawing 
of inferences demands the destruction of the cases studied. In seed 
germination studies, in testing milk for butterfat content, in determining 
the durability of manufactured articles, and so on, the investigation 
processes make the cases unfit for further use. Here again, knowledge 
about the population must be derived from a sample. 

The possibility of making inferences about a population from the 
characteristics of a sample is fundamental in research work. Statistics 
provides a rigorous method of judging the reliability of inferences 
drawn from a sample. The fundamentals of sampling theory are com- 
plex and mathematically difficult, but a practical understanding of 
sound sampling procedures demands little more than common sense. 
It is only common sense to recognize that reliable inferences can be 
drawn only from representative samples and that the larger the sample 
the more reliable the inference it permits. If we desire to determine the 
mean height of Philadelphia women, for example, we would have more 
confidence in the information provided by a sample selected at large 
than in one provided by the members of a women's athletic club, and 
more confidence in a sample of 100 than in a sample of 10. 

In research, statistics is necessary both in summarizing a sample of 
numerical facts and in drawing inferences from the sample. Both pro- 
cedures are fundamental in deriving knowledge from numerical facts. 
A distinction is sometimes made between summarizing or descriptive 
statistics and sampling statistics. Such a distinction is arbitrary and 
can be misleading. We nearly always generalize, consciously or un- 
consciously, from a particular collection of data; in fact, it may be 


8 / Statistics in Education and Psychology 


doubted whether the human mind can keep from abstracting salient 
features of the data and forming impressions that bear on future judg- 
ments. Whether this is good or bad depends on whether the data are 
representative and properly analyzed. 


SELECTING A REPRESENTATIVE SAMPLE 


An argument based on sample evidence runs something like this: 
(1) The individuals in the sample are representative of a population of 
individuals; (2) certain facts are observed to characterize the individuals 
in the sample; (3) therefore, probably and approximately, the observed 
facts characterize the individuals in the population. The argument is 
simple, but the conditions which make it convincing are difficult to 
meet. In a later chapter we shall give unambiguous meaning to the 
words probably and approximately as they are used in statistics. At this 
point let us consider several conditions of sampling. 

Sample evidence has no demonstrable generality unless we know 
the population that was sampled and how the sample was selected. A 
statistical population, ordinarily referred to as population or universe, 
may consist of the attributes or performances of a specified group of 
persons; crop yields in a locality; elements of climate in a region for a 
stated period; characteristics of rural, village, or city schools of speci- 
fied size and location; attributes of manufactured articles of a given 
kind coming off an assembly line; characteristics of houses, farms, 
animals, and the like in a certain locality, state, or nation; or any other 
set of specified objects or events that possess a common characteristic 
in varying amounts or that have been selected according to a single 
principle. Statistical populations can always be expressed as numerical 
facts. Although this is putting the cart before the horse, it may be said 
that all of the individuals in a group to which a conclusion based upon 
sample evidence is to be applied must be thought of as members of the 
population from which the sample was selected. 

The exact specification or definition of the population involved in 
a research study is prerequisite to selecting a sample, both because it 
minimizes the danger of utilizing a nonrepresentative or biased sample 
and because it sets forth the logical limits of the inferences that are 
drawn from the sample evidence. The specification should set forth 
clearly just what objects or events are considered to constitute the 
population. 

After the population is specified, a "representative" sample is 
selected. Although there is no way of making sure that a sample is 
representative, both theoretical considerations and experience indicate 


Introduction / 9 


that a sample selected at random, i.e., by chance, is the most trust- 
worthy. When the method of sampling assures every individual in the 
population the same chance of being drawn as any other individual, the 
sample is said to be random. There are several methods of random 
sampling, all of which require that every individual in the population 
be listed. One of the most satisfactory and easiest to apply is based 
upon the use of random numbers, such as those included in Table H, 
Appendix. Let us demonstrate both the use of random numbers in 
sampling and the representative nature of a random sample by selecting 
a sample of 20 from the population of 400 scores in Table B, Appendix. 

We may begin by pointing at random to a number in Table H. 
Let us suppose that we have pointed to the number 8 in row 35, column 
7. From this point, or any point so selected, we may read up, down, 
sidewise, or diagonally, but once we have entered the table we should 
proceed in orderly fashion. Since the serial numbers of the 400 in- 
dividuals in the population are three-digit numbers, we shall need to 
include three random digits at each reading. Let us agree to read up- 
ward from the starting point in row 35, column 7, and to include the 
digits in columns 7, 8, and 9. The first three-digit number we encounter 
for which there is a corresponding serial number in the population 
is 097, the second 380, the third 347, and so on. When we reach the 
top of the columns, let us cross over to columns 10, 11, and 12 and 
read downward until we have 20 random numbers in the range 000 
to 399. The numbers so selected and the corresponding individual 


Scores in the population are 


097 56 085 28 254 22 361 39 
380 44 379 38 051 45 186 48 
347 40 140 54 179 28 109 38 
067 45 262 4l 223 28 049 50 
362 40 000 30 195 64 317 37 


There are various ways of using a set of random numbers in sampling. 
The important precaution to observe is that of following a systematic 
pattern of selecting digits after the table is entered, or of making as 
many random entries as there are numbers needed ; in other words, to 
make sure that the randomness of the digits is permitted to prevail. 
Let us examine the representativeness of the sample we have ob- 
tained. The mean of our sample is about 41; the mean of the popula- 
tion is about 40. Thus, the sample provides a good approximation to 
the mean value of the population. The sample is too small to provide 
dependable estimates of other population characteristics, such as form 


of distribution and variability. (See Ex. 4.) 


10 / Statistics in Education and Psychology 


One of the commonest questions in sampling is that regarding 
adequate sample size. There is no simple answer to the question. 
Adequacy of size depends upon the sort of evidence being sought and 
the degree of reliability desired. The latter depends, to a large extent, 
upon the homogeneity of the population. The word homogeneity in 
statistics refers to the degree of similarity characterizing the individuals 
in the population in a given respect—e.g., similarity in respect to height 
or IQ. If the individuals were exactly alike in the given respect, the 
population would be perfectly homogeneous in that respect, and a 
sample of one individual would be adequate. Populations are never 
perfectly homogeneous, and the property of homogeneity is relative. 
It is quite obvious that the less homogeneous a population is, the larger 
à random sample needs to be in order to provide evidence of a given 
degree of reliability. In later chapters, a great deal of attention will be 
given to the question of adequacy of sample size, in light of the sort of 
evidence sought and the homogeneity of the population. 

In a real research problem, of course, we cannot examine repre- 
sentativeness and adequacy of size of the sample in the light of facts 
about the population, since these are the very facts we are attempting to 
determine from the sample. Extensive experience, however, has indi- 
cated that a random sample, particularly if it is large, provides an ap- 
proximate replica of the population. In a later chapter we shall see that 
a random sample, in addition to providing trustworthy evidence re- 
garding the population, provides a rational basis for estimating the 
amount by which a sample estimate may be in error. 

The method of sampling described above is known as simple 
random sampling, and a sample so selected is known as a simple random 
sample. There are various extensions of simple random sampling, two 


of the most useful of which are stratified and cluster sampling. These 


are well described by Wallis and Roberts (Ref. 51). 

When sampling from a population in which the individuals cannot 
be listed, it is impossible to use random numbers or some other lottery 
method of selecting a sample. There is no altogether satisfactory way 
of selecting a sample in such situations. The best that the researcher 
can do, perhaps, is to make a determined effort to avoid bias and to 


report in detail the sampling method used and the composition of the 
sample which resulted. 


NONRANDOM SAMPLES 


Samples that result from other than random methods of selection 
are sometimes called accidental, incidental, or uncontrolled. They are 


Introduction / II 


here designated nonrandom, because that term is the most expressive 
of their nature. 

A great many of the experimental studies dealing with instructional 
methods involve such samples, as do the majority of the questionnaire 
investigations. In the former, the convenient intact group, perhaps 
the students in a certain class or room, is the sample used; in the 
latter, self-selected samples are established by the respondents who 
return questionnaires. 

It is observable that the majority of the samples used in research in 
the social sciences are nonrandom; and it is likely, because of ad- 
ministrative and other practical difficulties, that the practice cannot be 
avoided, at least in many instances. 

What can be said about the nonrandom sample? Since it cannot 
be considered to be representative of any known propulation, the in- 
formation it yields, strictly speaking, does not permit generalization. 
Whether the findings hold for another group can be determined only 
by “try-and-see” procedure, which really means repeating the investiga- 
tion. It has been argued that the researcher can generalize to some 
imagined population which would be fairly represented by his sample, 
but it is difficult to extract any real sense from the argument. 

However, it would be incorrect to conclude that the study of a 
nonrandom sample is without significance. The investigation may be 
worthwhile, both because the sample evidence may be important in 
itself and because the investigation may suggest significant problems 
and hypotheses for more extended and general study. Furthermore, 
there is always the possibility that a nonrandom sample is adequately 
representative of other groups, so that what has been observed will 
have some generality. Because of this possibility, a nonrandom sample 
should be described in detail with respect to the factors which may 
have influenced the findings. In applying the findings from a non- 
random sample to another group, we necessarily proceed by analogy, 
i.e., we reason from the particular to the particular. Such reasoning is 
Sound only when there are real similarities and no crucial differences 
between the particulars involved. Complete description of the non- 
random sample is needed in order to compare it with other groups to 
Which the findings may be applied and thereby to judge whether the 
conditions of sound analogy are met. 

We shall return to the question of judging the reliability and sig- 
nificance of sample evidence in a later chapter. Before we can deal with 
these fundamental issues, it is necessary to develop several statistical 
concepts relating to frequency distributions and the measurement of 


12 / Statistics in Education and Psychology 


their important properties. As he learns about these, the student should 
cultivate the habit of thinking about a given set of facts as a sample of a 
much larger body of facts which could be obtained if time and money 
were lavished. The student is urged also to remember that no formal 
rules can take the place of experience and common sense in the selection 
and interpretation of samples. 


MISUSE OF STATISTICS 


Everyone has heard such gibes as “There are little liars, big liars, 
and statisticians,” “You can prove anything provided you use sta- 
tistics,” and “Statistics supports many mistaken things including 
statisticians." It must be admitted that statistics in the hand of the 
neophyte is an uncertain tool, and in the hands of the propagandist, a 
dangerous one. Most exhibits of numerical data admit of more than 
one kind of analysis, and if a person sets out to analyze any given 
evidence according to some point he wishes to establish, he probably 


will be able to construct an argument plausible enough to fool at least 
the ignorant. 


Most of the arguments against the use of statistics in social research 
can be reduced to the single one that statistics can be and frequently 
is misused. Throughout the following chapters considerable attention 
will be given to the proper use of statistics; at this time, it is desirable 


only to preview and to illustrate briefly several of the more flagrant 
misuses. 


A great many of the misuses of statistics arise through the following 
practices: 


a. Using an average value to represent a set of numerical data when the 
average obscures important features of the data. For example, if the 
yearly salaries of three workers are $20,000, $4,000, and $3,000, re- 
spectively, the arithmetic mean, $9,000, would obscure more informa- 
tion than it would convey. If the mean were the only information 
reported, the data would in effect be falsified. (See Ex. 9a and 9b. 
See also Chap. III.) 

b. Ascribing the characteristics of a collection of numerical data to a par- 
ticular case. In any statistical study, a great deal of caution must be 
exerted in the interpretation and application of the findings. In the 
example above, the average man cannot be said to earn $9,000 a 
year in a real sense. To illustrate further, the generalization that stu- 
dents with higher IQ's make better academic records cannot be ap- 
plied to an individual student without careful qualification and appre- 
ciable uncertainty. The generalization is reliable and important, but 
a great many variables, some not yet measurable, affect a particular 


Introduction / 13 


student's achievement. Failure to recognize that group character- 
istics cannot be applied indiscriminately to individual members of the 
group is the principal source of stereotyped thinking. 

c. Making comparisons without reference to all of the pertinent data. 
To say that city workers are better paid than rural workers has little 
meaning unless cost of living is taken into account. Sensible interpre- 
tations of wages cannot be made independently of prices. In further 
illustration, the economic value of a college education cannot be 
established by comparing salaries of college men with salaries of non- 
college men. Other circumstances, such as original wealth, social 
status, and intelligence, confound such comparisons. (See Ex. 9d.) 

d. Making comparisons without a clearly defined base or when the base 
is changing. For example, the trend of heart disease, largely an old- 
age affliction, is frequently charted without reference to the changing 
age structure of the population. Comparisons of the incidence of 
mental illness at present with that of 50 years ago are frequently made 
without reference to the fact that diagnosis and classification have 
changed radically. It has been pointed out (Ref. 4) that one of the 
reasons for the apparent increase of lung cancer is that lung cancer 
has not been detected as such until relatively recent times. It is im- 
possible, of course, to draw firm conclusions from comparisons when 
the bases are changing. 2 

e. Inferring causation because of association. Statistical method cannot 
demonstrate a causal relationship between two variables; it can only 
provide a measure of the amount of association. Whether two 
variables, such as education and income or cigarette smoke and lung 
cancer, are related as cause and effect is a question which demands 
information beyond the fact of association. We return to this topic 
in connection with correlation and regression in a later chapter. | 

f. Making unwarranted inferences from a sample—overgeneralizing. 


The inferences about a population drawn from sample evidence are 


subject to sampling error. The evidence provided by even a large 


representative sample must be interpreted in light of possible error. 
The information provided by a small or nonrandom sample may pro- 
vide no useful generalization at all. The most dramatic example of 
making unwarranted inferences is that of the classic Literary Digest 
poll of 1936. Unfortunately, few unwarranted inferences are so 
quickly detected or so soundly ridiculed. (See Ex. 9e.) 

B. Drawing conclusions from basically inaccurate or „unsound data. 
Statistical manipulation of facts do not improve their quality. The 
data from a typical personality inventory, for example, remain suspect 
no matter how many statistical operations they are put through. 


ent can detect such misuses and fallacies as 


those listed above; however, not all misuses and misinterpretations are 
subtle abuses of statistics are dis- 


SO easy to detect. Some of the more itis s 
cussed in References 22 and 36. In the best of hands, statistical analysis 


Any thoughtful stud 


14 / Statistics in Education and Psychology 


of numerical data, like any other attempt to gain reliable knowledge, 
is not infallible; even competent statisticians may disagree about the 
treatment and meaning of some data. The method of statistics con- 
stitutes a necessary but not sufficient condition of drawing sound con- 
clusions from mass numerical data. To ascribe to any numerical fact, 
such as an average IQ or price index, one and only one possible in- 
terpretation would be sheer numerology. No research tool can take the 
place of a mind in its user. As we study the more technical aspects of 
statistics, however, we shall find that statistics not only aids in collecting 
and analyzing evidence, but also is the most effective tool we have in 
determining whether evidence is fairly collected and properly analyzed. 
Moreover, since they explicitly demand unbiased and reliable evidence, 
statistical methods tend to be self-correcting. 

It is one of the aims of a course in statistics to bring about a 
thoughtfully critical attitude regarding conclusions purportedly based 
upon facts. Generally speaking, such an attitude coupled with common 
sense and an understanding of elementary statistical method enables a 
person to detect misuses of statistics and to reason well about statistical 
data. It has been said that, to a large extent, the correct use of statistics 
depends upon common sense and simple arithmetic. 


Statistical Data 


The beginning student is likely to have some difficulty in his reading 
and communicating due to the lack of a standard vocabulary in sta- 
tistics. In a statistical study, the variables that are being investigated 
and the numerical facts relating to them are designated by various 
names. In the interest of ease and clarity of communication, it is de- 
sirable to define here several of the more common statistical terms and 
to illustrate their uses. 


DATA 


The term data (plural of datum, meaning fact) is an all-embracing 
term used to designate the evidence or facts which describe a group or a 
situation and from which inferences or conclusions are drawn. Nu- 
merical facts such as heights, weights, scores on educational tests, prices 
of goods, crop yields, salaries, school enrollments, numbers of people 
voting for presidential candidates, and so on, conveniently may be 
referred to as statistical data or just data. The broad meaning of data 


makes the term extremely useful, provided, of course, its referents are 
clear. 


Introduction / 15 


PHENOMENON; VARIABLE; VARIATE 


Statistics is often referred to as the study of phenomena which are 
characterized primarily by variation. In the most general sense, sta- 
tistical data consist of the numerical observations made of varying 
phenomena. The term phenomenon refers to some aspect of the environ- 
ment, such as weather, a human trait or activity, or an economic or 
social circumstance, which can be observed or measured. In statistics 
and in science, phenomenon does not imply something extraordinary or 
prodigious, as it frequently does in common usage; it refers simply to 
an object or event capable of being perceived. 

The term variable refers to a phenomenon which shows variation 
from place to place, from time to time, or from object to object. Any 
phenomenon that can be investigated statistically is a variable. 

The student will find that some writers use the term variate inter- 
changeably with variable, while others use it to refer to a particular 
value of a variable—e.g., a single IQ among the several which result in 
measuring intelligence in a group. Thus, the plural variates may mean 
different variables or may mean the different values of a given variable. 
Although the meaning of the word ordinarily is clear in context, we will 
avoid its use hereafter. 

There are two broad classes of variables: (1) those which vary in 
amount and (2) those which vary in kind or quality. The observations of 
a variable of the first class are fundamentally different from the observa- 
tions of a variable of the second class, as will be brought out below. 


STATISTICAL SERIES 


A set of numerical facts or items originating in the observation of a 
single variable commonly is called a statistical series, or just a series. 
The term is a fortunate one, for it sums up the salient features of a set 
of facts that can be dealt with statistically, namely, a common char- 
acteristic and variation in size or kind. 

When the items in a series vary in size, they are capable of being 
Classified in order of size and therefore constitute a quantitative series. 
A quantitative series is continuous if the values of its items differ by 
amounts which are indefinitely small, or if they theoretically would so 
differ if measured more precisely. Examples of continuous series are 
chronological ages or test scores of a group of individuals, temperatures, 
crop yields, and inches of rainfall. The items in such a series may differ 
theoretically by indefinitely small amounts. Continuous statistical series 
are always quantitative, but not all quantitative series are continuous. 


16 / Statistics in Education and Psychology 


When the items in a series must be expressed in whole numbers, the 
series is discontinuous, or discrete. Discrete series usually are made up 
of items whose values have been determined by counting. Class sizes, 
school enrollments, census data, and the number of heads observed in 
repeated tosses of a set of coins are examples of discrete series. We shall 
see later that it is frequently useful to treat discrete Series as continuous 
and vice versa. 

The items of a statistical series originating in the observations of a 
phenomenon which varies in kind or quality cannot be arranged from 
least to most under a single classificatory principle. For example, the 
numbers of people belonging to various religious denominations in a 
city cannot be grouped from least to most under a single heading. The 
variable is “church membership," and the variation of the items is that 
of kind, namely, Baptist, Catholic, Episcopalian, and so on. Such a 
series is called qualitative. Other examples of qualitative series are those 
relating to occupation, race, Sex, hair and eye coloration, political 
affiliation, and enrollments in different kinds of schools and colleges. 

In summary, the observations of a variable may yield either a 
quantitative or a qualitative series, depending upon whether the varia- 
tion is in amount or in kind. If the variation is in amount, the variable 
is considered to be quantitative; if in kind, qualitative. In statistics, 
qualitative variables frequently are referred to as attributes. As will be 
shown later, the statistical methods of studying attributes are neces- 


sarily different from those of studying quantitative variables, although 
the methods connect at several points. 


STATISTICAL ITEMS 


The items of a statistical series a 
no one best or commonest name fo 
Scores, values, and measures a 
of a quantitative series. 
which the items Spread i 
or the scale of scores. 

As has been emphasized, statistics deals 


The items in a qualitative series are not capable 
until they are entered as numb 


re variously designated. There is 
r them. The terms observations, 
re used synonymously to refer to the items 
When the series is continuous, the range over 
s commonly referred to as the scale of values 


with numerical facts. 
of statistical treatment 
ers under the categories characterizing 
the qualitative variable, When entered as a tally in a category, an item 
becomes an instance or a case of a quality of the variable. All of the 
cases in a category constitute the frequency in that category. Thus, 


frequencies in categories are the numerical facts characterizing a 
qualitative variable. 


Introduction / 17 


PROPORTION; PERCENTAGE 


Consider the qualitative series shown in Table l.l. There are 5 
occupational categories, and the frequency in each category is shown. 
The total frequency (equal, of course, to the number of cases in the 
series) is 25. 

In comparing the frequency in a particular category with the total 
frequency, it is customary to take the ratio frequency divided by total 
frequency. The ratio of a frequency in a category to the total frequency, 
ie., the relative frequency, commonly is called a proportion and is 
symbolized by p. Thus the proportion of skilled occupations in 
Table 1.1 is 7/25 or .28. 

TABLE 1.1 
Occupations of Fathers of 25 
Eighth-Grade Pupils 


OCCUPATION FREQUENCY 
Professional 2 
Business 6 
Skilled Labor 7 
Semiskilled Labor 6 
Unskilled Labor 4 

TOTAL 25 


When a proportion is multiplied by 100 the product is a percentage 
symbolized as P. Proportions may always be transformed into per- 
centages without loss of meaning, but the converse is not true. Per- 
centages frequently are used to mean something other than a fractional 
part of a whole, a meaning which proportions, by definition, cannot 
have. 


The Study of Statistics 


The several aims of an introductory course in statistics which have 
been implied in the preceding pages are stated below in terms of what 
the student is expected to acquire: 


a. An understanding of statistical method as applied in psychological 
testing and research and the ability to interpret statistical findings. 

b. The ability to compute various statistical measures, to analyze numer- 
ical data intelligently, and to report statistical data. 

C. A thoughtfully critical attitude toward statistical method and the 
ability to detect misuses of statistics. 

d. An appreciation of the value of statistics in dealing with research 


problems. 


18 / Statistics in Education and Psychology 


MATHEMATICAL TRAINING NOT NECESSARY 


Asa tool in research, statistics comprises methods of collecting and 
analyzing quantitative evidence relating to problems which cannot be 
dealt with successfully by intuition or by logical methods. Statistical 
thinking is essentially similar to scientific thinking. The central question 
is always, “What is the evidence and what can we learn from it?" 

Being quantitative, statistics leans heavily upon mathematics, both 
in theory and application. Although the mathematics required in 
application are largely arithmetic and easy algebra, parts of statistical 
theory require higher mathematics. The student who has had mathe- 
matical training is fortunate indeed. He will find both the application 
and theory of elementary statistics relatively simple. 

But the study of statistics cannot be limited to those trained in 
mathematics. Statistical methods are so generally needed and the 
statistical treatment and interpretation of Observational data are so 
widely used that anyone is seriously handicapped without some knowl- 
edge of statistics. Students not trained in mathematics, although they will 
rarely become accomplished statisticians, can acquire all of the abilities 
listed above. Those who wish to review mathematics, as used in ele- 
mentary statistics, will find Walker's book (Ref. 49) extremely helpful. 


HOW TO STUDY STATISTICS 


Statistics tends to be difficult, not because it is mathematically 
complex, but because it involves a point of view and ideas which are 
new to most people. If the student is to acquire more than a superficial 
understanding and wooden use of statistics, a great deal of patient, 
independent work and hard thinking will be necessary. 

The writer's experience with beginning students has led to several 
convictions regarding effective methods of study. As a rule, the student 
Should read carefully the material in a section before attempting to do 


the exercises and then reread the material aft 
exercises. 


lems not o 


í er he has worked the 
The application of statistical techniques to particular prob- 


nly develops skill in application but it also contributes enor- 
mously to the understanding of theory. 

Since statistical theory and technique usually relate to numerical 
facts, a sort of. study which may be described as numerical checking, or 
numerical clarification, may be of aid to the student. As a simple 
illustration of numerical clarification, suppose the statement is made 
that the addition of a constant to each of N Scores changes the sum of 
the original scores by an amount equal to N times the constant. This 


Introduction / 19 


somewhat involved statement is easy to check or clarify numerically. 
Let three numbers, 2, 4, and 5, be the scores. The sum of the three is 11. 
Now suppose that we add the constant 10 to each of the three. We now 
have 12, 14, and 15, whose sum is 41. The difference between the sums 
is 30, and 30 is equal to N times the constant, i.e., 3 X 10. The above is 
equivalent to checking the algebraic equation MX + k) = EX + Nk, 
in which Y means sum; X takes the values 2, 4, and 5; N — 3; and 
k = 10, 

The student will find many opportunities for numerical checking 
and clarification. The manipulation of a simple, regular series, such as 
3, 4, 5, 6, 7, or of an irregular series, such as 0, 3, 5, 37, may help in 
clarifying a difficult concept. In later pages, numerical study will be 
Suggested from time to time. 

We should remark, perhaps, that a numerical check does not con- 
stitute general proof. The check illustrated above shows only that the 
statement holds true for the numbers 2, 4, and 5 and the constant 10. 
However, if a general statement or formula is false, its falsity may some- 
times be detected by numerical check. 

There is another noteworthy aid to understanding statistical con- 
cepts. If the student has statistical data in which he has a personal inter- 
est and subjects the data to the various techniques to be described in the 
following pages, he usually gains rapidly in understanding. The writer 
provides numerous opportunities for application of techniques, but 
none will be as productive as those the student will make if he studies 


problems of his own choosing. 


COMPUTING AND ROUNDING 


Statistical work usually involves arithmetic computations, and 
questions regarding how many digits to retain in a result constantly 
arise. There are no simple or generally satisfactory answers to the 
questions. In the illustrative computations throughout the text and in 
the answers in the Appendix, appropriate numbers of digits are re- 
tained. These will serve as models. In computing, it is a good idea to 
carry along one or two more digits than will be retained in the answer, 
rounding off at the end. The conventions for rounding numbers in 


Statistics are 
a. In rounding whole numbers, the dropped digits are replaced by zeros, 


but in rounding decimal numbers dropped digits are not replaced. 
Example: Rounded to one-figure accuracy, 19 becomes 20, but .019 


becomes .02. 


20 / Statistics in Education and Psychology 


b. If the value of the digit(s) to be dropped is less than 5, no change is 
made in the preceding digit. Example: Rounded to two-figure ac- 
curacy, 1,549 becomes 1,500, and .3333 becomes .33. 

c. If the value of the digit(s) to be dropped is greater than 5, the pre- 
ceding digit is increased by 1. Example: Rounded to four-figure 
accuracy, 66.666 becomes 66.67, and 5,112.51 becomes 5,113. 

d. If the value of the digit(s) to be dropped is exactly 5, no change is 
made in the preceding digit if it is even, but if it is odd it is increased 
by 1. Example: Rounded to three-figure accuracy, 25.650 becomes 
25.6, but 25.75 becomes 25.8. 


The effect of rounding obviously makes any number an approxi- 
mate number. When a number is rounded to 25.8, for example, 25.8 
represents some number between 25.75 and 25.85. When a number is 
rounded to 76, 76 represents some number between 75.5 and 76.5. 
Thus, the final digit of a rounded number or of any approximate number 
is inexact. 

Accuracy of numbers is usually described in terms of the number 
of digits, other than placeholding zeros, which they contain. The 
number 25.8 is said to have three-figure accuracy; the number 76 has 
two-figure accuracy. The numbers .0015 and .0060 have two-figure 
accuracy, since, of the four digits both contain, the first two are merely 
placeholding zeros. A number such as 25,000 may have two-, three-, 
four-, or five-figure accuracy, depending on how many zeros are place- 
holders. If the number is considered to have three-figure accuracy, the 
fact may be indicated by writing 25,000 or 250 X 102. As a rule, in 
whole numbers between 1 and 100, zeros are other than placeholders. 
On the other hand, in numbers containing three or more zeros, at least 
one zero is usually a placeholder. 


For comprehensive treatment of accuracy in computation and 
related problems, see References 42 and 49. 


EXERCISES 


1. Below are several of the perennial problems of the social sciences. Select 
one and discuss several ways of attempting to find a solution. 
a. 


Should the federal government aid private as well as public schools? 
b. 


Should efficiency rather than seniority be considered in salary sched- 
ules? 

c. Should school psychologists attempt psychotherapy ? 

d. Should free medical care be provided for all? 


. Suggest a problem which you believe can satisfactorily be dealt with 
(a) authoritatively, (b) traditionally, (c) intuitively, (d) factually. What 
are the advantages and limitations of each method ? 


Ju 


Introduction / 21 


. What are some populations the following samples might represent: 


(a) the students in a given psychology class, (b) 50 temperature reports, 
(c) 10 case studies of juvenile delinquents, (d) the preserved letters of 
Sigmund Freud? 


. By use of random numbers, select 5 samples of 10 scores each from the 


population of 400 scores in Table B, Appendix. Tally the scores in the 
classes shown below, then combine the 5 samples to form a sample of 50. 
Is the sample of 50 more likely to be representative of the population than 
any one of the samples? Why? 


CLASS SAMPLE 
1 2 3 4 5 Combined 

65-74 
55-64 
45-54 
35-44 
25-34 
15-24 

5-14 
Total 10 10 10 10 10 50 


. Apart from money and time considerations, under what conditions would 


a small sample be preferred to a large one? 


. A single drop of blood from the fingertip can be relied on to give a fair 


blood count for an individual, but a random sample of 100 ten-year-olds 
cannot be relied on to give a fair estimate of the proportion of Phila- 
delphia ten-year-olds in various mental age brackets. Explain. 


. Do these statements mean the same thing: “The burden of proof of a 


representative sample is always on the investigator," and “When an 
investigator generalizes his results from a sample to a population, it 
is his responsibility to indicate the logic and basis of the generalization ?” 


- A college instructor tried out a new method of teaching in his class and 


decided to use the new method thereafter. What is the sample? The 
population? 


. Criticize the following, stating one or more fallacies in each. 


a. A charitable organization received 5 donations during the week con- 
sisting of $25,000, $10, $5, $5, and $1 and announced that the average 
donation was a little over $5,000. 

b. In a production line experiment, involving 5 workers and 2 methods 
of assembling an article, the length of time in minutes required by 
the workers under Method I was 20, 18, 18, 17, and 12; under Method 
II, 42, 30, 9, 7, and 7. It was concluded that Method I was superior. 

c. A researcher sent questionnaires to a random sample of 50, of which 
31 were returned. Of the 31, 26 favored a proposal. The researcher 
reported that 83.87 per cent of the sample favored the proposal and 
that consequently it was safe to conclude that at least 75 per cent in 


the population favored the pro) L 


22 / Statistics in Education and Psychology 


10. 


11. 


12; 
13. 


14. 


15; 


16. 


d. (From the Philadelphia Inquirer, June 21, 1962.) The dollars and 
cents value of a college education is higher today than ever before. 
-.- In 1961, the average pay of a worker with less than eight years 
of schooling was $3483; that of a worker with eight years, $4750; 
. « that of a worker with twelve years, $6102; ... that of a worker 
with four or more years of college, $9530. 

€. A social science teacher had 120 students in 5 classes, He gave them 
à questionnaire on social attitudes, then invited them to accompany 
him on a tour of the slums. Thirty-five accepted the invitation. 
After the visit he gave a second questionnaire on social attitudes and 
found that the attitudes of the 35 were significantly changed. He 
concluded that visiting the slums changed social attitudes. 


Sir Francis Galton Observed, “General impressions are never to be 
trusted. Unfortunately when they are of long standing they become 
fixed rules of life, and assume a prescriptive right not to be questioned. 
Consequently those who are not accustomed to original inquiry enter- 
tain a hatred and horror of Statistics. They cannot endure the idea of 
submitting their sacred impressions to cold-blooded verification.” Men- 
tion several “general impressions” capable of being studied statistically. 
By reference to the data of Table A, Appendix, illustrate each of the 
following: attribute, category, continuous series, discrete series, qualita- 
tive series, quantitative series, variable. 

Give original illustrations of the terms in Exercise 11, above. 

In what sense can scores on a psychological or educational test, ordi- 
narily stated as whole numbers, be considered continuous ? 

In each of the following, state whether the series is continuous or discrete: 


a. Mental ages of delinquents. 
b. Deposits during a day in a given bank. 


c. Number of times 12, 11, ..., 2 spots appear in rolling a pair of dice 
50 times. 


d. Sprinting times of participants in a 100-yard dash. 
€. Scores in an intelligence test. 


Give an illustration where college enrollments would constitute (a) a 
discrete quantitative series; (b) a qualitative series. 


What sort of statistical series would the responses to each of the follow- 
ing questionnaire items constitute: 


- What is your age to the nearest birthday? 
- What is your height? Your weight ? 
What is your Occupation ? 
- Are you married? 
What is your income? 
Have you ever had a major operation? If so, for what? 


moO Aage 


17. 


18. 


19, 


Introduction / 23 


What are the proportions or relative frequencies in the various occu- 
pational categories of Table 1.1? What is the sum of the relative fre- 
quencies ? 

Check numerically to determine the meaning and possible incorrectness 
of the following: 


à. The square root of the product of two numbers is equal to the product 


of the square roots of the numbers. 
b. The square of the sum of three numbers is equal to the sum of the 


squares of the numbers. 
C. ax + by + ez = (a + b + olx + y + 2). 


Numerically investigate the effect upon the sum of a statistical series of 
multiplying each number in the series by a constant. The effect upon 
the sum of a statistical series of decreasing each number in the series by 


a constant. 


CHAPTER II 


Organization and. Presentation of 
Statistical Data 


It is frequently the case that statistical data must be organized in 
tables or graphs before their meaning becomes clear. The process not 
only reduces the bulk of the data to comprehensible size, it brings into 
relief the resemblances and differences within and between classes of 
data and thus facilitates comparison. Moreover, tabulating or grouping 
data often simplifies the computation of the important summary sta- 
tistics considered in later chapters. 

In this chapter we shall consider the frequency distribution and 
graphs of the frequency distribution. These are important in inference 


as well as in description; in fact, the concept of frequency distribution 
is fundamental in statistical methods. 


The Frequency Distribution 


One of the commonest and simplest of statistical tables is the 
frequency distribution. Although qualitative data, as we have seen, must 
be classified as frequencies in Specified categories or qualitative classes, 
the term frequency distribution, as used in statistics, ordinarily refers to 
the tabulation of quantitative data in classes that vary in size. 

When we are interested primarily in the manner in which the items 
in a long quantitative series vary in size, the frequency distribution is 


24 


Organization and Presentation of Statistical Data / 25 


an appropriate method of classification. Whether the series is discrete 
or continuous, its frequency distribution has both theoretically and 
practically important aspects in statistics. The more elementary 
theoretical aspects will be touched on in this section; in later chapters, 
the distribution will be utilized to simplify several types of statistical 
computations. 


THE DISTRIBUTION OF DISCRETE SERIES 


To illustrate the frequency distribution of a discrete series, let us 
consider a rather typical empirical study of the operation of chance. 
Suppose we have tossed 5 coins 20 times and have observed the results: 


HTTTT TITIT HHTTT HTTIT HHHTT 
HHTIT HHHHT HHHTT HHHTT HHTTT 
HHTTT HHHTT HTTIT HHHTT HHHHH 
HHHTT HHTIT HHHHH HHHTT HHHTT 


In this situation, the frequencies of 5 heads, 4 heads, 3 heads, and so on 
constitute the information of concern. This information may be con- 
veniently and informatively classified, as shown in Table 2.1. 
TABLE 2.1 
Distribution of Heads on 20 
Tosses of 3 Coins 


FREQUENCY OF 
OCCURRENCE 


HEADS 


[- ua 
— Lun 0 — t3 


TOTAL FREQUENCY 20 


The discrete values, 5 to 0, constitute the classes in this distribution. 
The frequencies in the classes are recorded in the right-hand column. 

In further illustration, suppose a research worker has investigated 
the size of the families represented by 1,000 high school students. To 
make his data manageable, he would ordinarily first classify them in 
Order of frequency of occurrence of different sizes of families. In other 
Words, he would construct a frequency distribution. 


26 / Statistics in Education and Psychology 


THE DISTRIBUTION OF CONTINUOUS SERIES 


The classification of measures of a continuous series is usually 
somewhat more involved than that of a discrete series. Consider the 
138 VAT scores of the college freshmen, shown in Table A, Appendix. 
As given in the table we can learn little from them. The scores might be 
arranged in descending or ascending order, and tally marks used to 
indicate the frequency of each score. If this were done, we would have 


SCORE FREQUENCY 
345 / 
357 / 
370 / 
392 / 
395 "7 
751 1! 
795 / 


This arrangement would be one possible frequency distribution of the 
scores, but it would not give us the “best” view of the tendency of the 
scores to cluster around certain values, and very likely would be sub- 
stantially altered if we added the scores from some other freshman 
group. In short, such a distribution would be too detailed to give a 
comprehensive or stable picture of the performance of freshmen on the 
verbal aptitude test. Another objection to such a distribution, although 
of less importance, is that it would fail to simplify later computations 
to an appreciable extent. 

Now let us telescope the array by indicating merely the number of 
scores falling in the interval 330 to 359, 360 to 389, 390 to 419, and so 
on. This gives one or the other of the distributions shown in Table 23, 
depending upon whether we indicate the score classes in ascending or 
descending order. Although either method may be used, the latter is 
the more common. 

The student may well ask why the particular selection of 330-359 
- -. 780-809 classes was made in the present example. The answers to 
the question will provide general suggestions for constructing the fre- 
quency distribution. For the data in hand, the selection 


a. Permitted 16 classes. Fewer than about 10 classes or more than about 
20 tend to Obscure the significant features of most collections of data. 
Some writers defend the "about 10-20 classes" rule on the basis of 


Organization and Presentation of Statistical Data / 27 


TABLE 2.2 


Distribution of Scores of 138 Male 
Liberal Arts College Freshmen on 
College Board Verbal 
Aptitude Test 


SCORE FREQUENCY SCORE FREQUENCY 
330-359 2 780-809 1 
360-389 l 750-779 2 
390-419 6 720-749 3 
420-449 7 690-719 6 
450-479 14 660-689 7 
480-509 24 630-659 12 
510-539 16 600-629 8 
540-569 14 570-599 15 
570-599 15 540-569 14 
600-629 8 510-539 16 
630-659 12 480-509 24 
660-689 7 450-479 14 
690-719 6 420-449 7 
720-749 3 390-419 6 
750-779 2 360-389 1 
780-809 1 330-359 2 
N = 138 N = 138 


omputations. These are only secondary 


ease and accuracy of later c 1 
hat of provision for a comprehensive 


considerations; the first is t 


classification. (See Ex. 1.) I ER 
b. Permitted a convenient classification Or grouping interval. Some 


multiple of 5 is to be preferred as the grouping interval if its use 
does not result in the violation of other desirable features. 

c. Utilized lower indicated class limits 330, 360...., 780 which are 
divisible by the grouping interval 30. This makes it somewhat easier 


to tabulate the scores. 
. Another suggestion should be added to the above. The grouping 
interval should be an odd number, provided its use does not result in 
violation of other desirable features, and provided 10, 20, 30, or some 
other multiple of 10 is not appropriate. Odd intervals simplify later 


computations. oe 
The technical terms and ideas relating to the frequency distribution 


are described below: 


a. The grouping interval is called the class interval and is abbreviated i. 
Although it is possible and sometimes useful to change the interval 
for one or more classes, the resulting distribution demands special 


28 / Statistics in Education and Psychology 


interpretation and treatment. In general, ; should be and is con- 
stant for all classes. 

b. The designations of classes—e.g., 330-359, 360-389, and so on—in 
Table 2.2 are called the indicated class limits or the expressed or 
written limits. Indicated class limits should be expressed as simply 
as possible, but must be stated with sufficient precision to prevent 
overlap. Thus, for the freshmen semester averages, Table A, Ap- 
pendix, if we use an interval of 5.0, we would need to indicate our 
lowest class as 50.0-54.9, or 50.5-55.4. If the measures to be tabu- 
lated are integral, the indicated class limits should be integral; if in 
tenths, the limits should be in tenths and so on. Classes are some- 
times indicated by their mid-value or midpoint. (See d below.) 

c. The real class limits (sometimes called implied, mathematical, or 
boundary class limirs) are always understood to extend 1/2 unit above 
and below the indicated class limits. Thus, the real limits of class 
330-359 are 329.5-359.5; the real limits of the class 50.0-54.9 are 
49.95-54.95; the real limits of 50.5-55.4 are 50.45-55.45. The in- 
dicated class limits are easy to write and easy to use in tabulation. 
The real limits are, however, the limits which must obtain if we are 
to consider our measures as continuous. Real limits, not indicated 
limits, are used in statistical computations. Expressed limits are for 
convenience only. 

d. The midpoint or mid-value of a class is defined by 1/2 of the sum of 
the indicated limits of that class (or, what is the same thing, 1/2 of 
the sum of the real limits). Thus, the midpoint of the 330-359 class 
is (330 + 359)/2 or 344.5. The midpoint of the class 50.0-54.9 is 
52.45. When the class interval i is an odd number, the mid-value 
will contain no more digits than the indicated class limits. If the 
latter are integral, the mid-value will be integral. 


The student may find that different writers follow different con- 
ventions, but the above conventions are easy to follow and entirely 
serviceable and logical. Obviously, a few variables such as ages taken 
as of last birthday yield measures which require special treatment in 
grouping. 
CONSTRUCTION OF THE FREQUENCY DISTRIBUTION 

. No definite rules for constructing a frequency distribution can be 
given. Most collections of data a 
the grouping scheme which i 


them here. 


a. Find the range of the seri 


; es to be grouped by subtracting the lowest 
value from the highest. 


Organization and Presentation of Statistical Data / 29 


b. Divide the range by the number of classes desired in order to gain 
an idea of the size of the class interval needed. As a general rule, 
if the series contains fewer than about 50 items, more than about 
10 classes are not justified; in fact, it is sometimes the case that the 
characteristics of a short series stand out better if fewer than 10 
classes are used. If the series contains from about 50 to 100 items, 
10 to 15 classes tend to be appropriate; if more than 100 items, 15 
or more classes tend to be appropriate. Ordinarily, not fewer than 
10 classes or more than 20 are used. 

c. If the range divided by the number of classes gives a quotient which 
is near 5 or some multiple of 5, use 5 or the multiple as the class in- 
terval; if not, select for tryout the odd number that is nearest the 
quotient. If neither provides an appropriate number of classes, then 
take the even number that is nearest the quotient as the class interval. 

d. Fix the lowest class and indicate the limits of all classes, according 
to the suggestions given in the preceding pages. It ordinarily is pre- 
ferable to fix class limits so that the lower expressed limits are divisi- 
ble by the interval. This tends both to save time in tabulating the 
data and to prevent mistakes. If the items tend to cluster around cer- 
tain values, however, the class limits should be fixed so that these 
values are at or near the midpoints of the classes. 

e. Make a tally sheet and tally the items as illustrated below. 


SCORE TALLY MARKS FREQUENCY 
780-809 / l 
750-779 // 2 
720-749 11/ 3 
690-719 d / 6 


f. Make a table from the first and third columns of the tally sheet. 


It should again be emphasized that no thumb rule or mechanical 
pattern can be followed in making a frequency distribution. It is rarely 
the case that one particular grouping scheme can be said to be the best 
among the many possible. Moreover, a scheme appropriate for bring- 
ing out the stable characteristics of a given collection of data may not 
be appropriate to use in computing statistics, such as the mean, from 
the same data. All that can be said is that, in constructing a frequency 
distribution, there should be good reasons for the particular scheme 


adopted. 


Graphical Presentation of the Frequency Distribution 


Bar and line graphs are easily adapted to the frequency distribu- 


tion. Their use in this connection is important, both because they por- 


30 / Statistics in Education and Psychology 


tray the form of the distribution and because they simplify various 
rather complex points in statistical theory. The three graphs most 


commonly used are the histogram, frequency polygon, and cumulative 
frequency curve. 


THE HISTOGRAM 


A histogram is essentially a bar graph of a frequency distribution. 
Its purpose is to show the frequencies within classes graphically. Con- 
sider the frequency distribution of Table 2.2. We may mark off real 
limits on the scale of scores and construct bars or rectangles whose bases 
are the class intervals and whose heights are equal to the frequencies in 
the respective classes. This procedure results in the histogram shown in 
Figure 2.1. It will be noticed that, since the heights of the rectangles are 
proportional to the frequencies and the bases of the rectangles are equal, 
the areas of the rectangles correspond to the respective class frequencies 
and the total area of the histogram to the total frequency, N, of the 
distribution. 

There are several ways of labeling the class intervals at the base of 
the histogram, the commonest being that of indicating the midpoints. 
The writer recommends, however, that the student follow the practice 
of indicating the real limits, as illustrated in Figure 2.1, until he becomes 


24r 


Al || 
"| i 


Frequency 
[s 
— 


Lie Ll 


329,5 (3895 — 449.5 — 509.5 569.5 529.5 = 689 
3 Ç 5 749, 
3595 — 4195 — 4795 — 5395 — 5995 6595 7195 i us da 


Score 


Fig. 2.1. Distribution of scores of 138 college h 
Verbal Aptitude Test. (From Table 2.2.) e etus ux College Boa 


Organization and Presentation of Statistical Data / 31 


Frequency 
Aa o œ 


N 


4 


Oc 2 3 4 Š 
Heads 


Fig. 2.2. Distribution of heads on 20 tosses of 5 coins. (From Table 2.1.) 


thoroughly accustomed to thinking about the sides of the rectangles of 
the histogram as always erected at the real limits of the class intervals. 

If a histogram of a frequency distribution of a discrete series is 
drawn, it is necessary to consider the classes as extending 1/2 unit 
above and 1/2 unit below their discrete values. A histogram of the data 
of Table 2.1 would be constructed as shown in Figure 2.2. 

In the discrete series histogram, it is customary to indicate classes 
by labeling their midpoints, but here, as for continuous series, the 
student must remember that the sides of the rectangles in the histogram 
are always drawn at the real limits of the classes. 

Although the histogram can be, and occasionally is, used in pre- 
senting frequency distributions in research reports, it is primarily useful 
as an aid to understanding statistical method. The simple but never- 
to-be-forgotten fact that the areas of the rectangles of the histogram 
correspond to frequencies in classes and the total area to the total fre- 
quency of the distribution simplifies a great deal of statistical theory. 


THE FREQUENCY POLYGON 


If the midpoints of the upper bases of the rectangles in a histogram 
are connected, the resulting figure is called a frequency polygon, as 


illustrated in Figure 2.3. 

It is customary to connect the midpoints of the upper bases of the 
extreme left-hand and extreme right-hand rectangles to the midpoints 
of the adjacent zero-frequency classes, and thus to close the polygon. 
It is left as an exercise for the student to show that the area of the closed 
polygon is equal to the area of the histogram, and that consequently 
the area of the closed polygon of a distribution corresponds to the total 
frequency N. Thus, either the area of the histogram or the area of the 
closed polygon may be thought of as graphically representing the total 
frequency of a distribution. 


32 / Statistics in Education and Psychology 
65, 


Frequency 


48.5 62.5 76.5 90.5 104.5 118.5 132.5 
55.5 69.5 83.5 97.5 111.5 125.5 


Intelligence quotient 


Fig. 2.3. Histogram and frequency polygon of a distribution of 293 intelligence 
quotients. (From Table 2.3.) 


The frequency polygon may also be thought of as a line graph 
Showing how frequency within class varies as class intervals take on 
successively higher values on the scale of scores. In the polygon of 
Figure 2.3, for example, the fact that the frequencies in successive classes 
increase up to the 83.5-90.5 class and then gradually decrease is clearly 
depicted. Such interpretation is correct, because the height of the poly- 
gon at any vertex corresponds to the frequency in the class whose midpoint 
is directly below the vertex. 

It is not necessary, of course, to construct the histogram before 
constructing the frequency polygon. If dots are placed above the mid- 
points of successive class intervals at a distance proportional to the 
frequencies, the polygon may be drawn without the histogram. In 
practice, the frequency polygon is always constructed this way. 


THE CUMULATIVE FREQUENCY CURVE 


It is possible to portray a frequency distribution by the cumulative 
frequency curve. The nature of this curve can be made clear by an ex- 
ample. Consider the frequency distribution of the IQ's of the eighth- 
graders as given in Table 2.3. It will be noted that a column "cumulative 
frequency” is included at the right of the frequency column. The cumu- 
lative frequency up to the 56-62 class is 1; to the 63-69 class, 3; to the 
70-76 class, 5;... ; and to the 126-132 class, 292—the entire cumula- 


Organization and Presentation of Statistical Data / 33 


TABLE 2.3 
Intelligence Quotients of 293 
Eighth-Grade Pupils 


FREQUENCY CUMULATIVE FREQUENCY 
IQ f CUM f. 
126-132 1 293 
119-125 11 292 
112-118 23 281 
105-111 31 258 
98-104 53 227 
91-97 61 174 
84-90 64 113 
77-83 27 49 
70-76 17 22 
63-69 2 3 
56-62 2 3 
49-55 1 1 
N = 293 


tive frequency, of course, being 293, the total frequency N. The idea 
of cumulative frequency up to each successive class means that if we are 
to treat the IQ' as continuous, we shall have to utilize real limits. 
“Up to" the 70-76 class, for example, must mean up to 69.5. The 
cumulative frequency curve of the distribution of IQ's is shown in 
Figure 2.4. To construct the curve, dots are plotted above the lower real 


300 
270| 
240 


Frequenc 


1 —( i-i " 
48.5 62.5 76.5 90.5 104.5 118.5 132.5 
55.5 69.5 83.5 97.5 111.5 125.5 


Intelligence quotient 


Fig. 2.4. Cumulative frequency curve of a distribution of 293 intelligence 
quotients. (From Table 2.3.) 


34 / Statistics in Education and Psychology 


limits of the classes at heights corresponding to cumulative frequencies 
up to those limits and connected by straight lines. The curve has its 
origin at the lower limit of the lowest class, i.e., the height of the curve 
is zero at this point, since there is zero frequency below the lowest class. 

The number of scores falling below a given value on the scale of 
scores may readily be estimated from the cumulative frequency curve. 
For example, suppose we wish to estimate the number of 1Q’s falling 
below 100 in the illustrative distribution. We find the point on the 
curve in Figure 2.4 directly above 100, go across horizontally to the 
frequency scale at the left, and read 195, approximately. 

We shall return to the cumulative frequency curve in Chapter V. 
There we shall see that when the cumulative frequencies are changed to 


percentages of total frequency, a cumulative percentage curve of wide 
usefulness may be constructed. 


CONSTRUCTION OF HISTOGRAMS, FREQUENCY POLYGONS, AND 
CUMULATIVE FREQUENCY CURVES 


Most of the rules generally to be observed in the construction of 
these figures have already been mentioned. It should be emphasized 
again, perhaps, that the real limits of the class intervals are used in the 
construction of the histogram and the cumulative frequency curve 
the midpoints in the construction of the polygon. 

As for all graphs, clear and comprehensive titles should be included 
beneath the figures. The source of the frequency distribution should 
always be cited. Both horizontal and vertical scales should be clearly 
labeled. The vertical sides of the rectangles of the histogram may be 
omitted if the form of the distribution, rather than the frequencies in 
particular class intervals, is being emphasized. 

As a rule, if the height of the highest rectangle is about 3/4 of the 
total width of the histogram, the histogram will be more pleasing to the 
eye than if very different Proportions are used. The same can be said 
about the ratio of the overall height to the overall width of the polygon. 


The cumulative frequency curve should ordinarily make, on the average, 
an angle of 40? to 50? with the scale of scores. 


and 


Types of Frequency Distributions 


The great majorit 
chological measureme 
This characteristic is 
deviation from the av 


y of frequency distributions encountered in psy- 
nts have in common an important characteristic. 
idealized in the statement, 1he more extreme a 
erage value, the less frequently it appears. The 


Organization and Presentation of Statistical Data / 35 


frequency polygons of such distributions tend, of course, to have single 
peaks and to slope more or less uniformly downward from the peaks. 

The single-peaked or bell-shaped type of distribution was found to 
characterize errors of observation in the physical sciences near the be- 
ginning of the nineteenth century, but it was not until the latter part of 
the century that the distribution was found to characterize the measure- 
ments of certain economic and social variables. The Belgian statistician, 
Quetelet, appears to have been the first person to advance the idea that 
mass observational data, from various sources, tended to be distributed 
according to the “law of error." The great English scientist, Sir Francis 
Galton, was so impressed by the tendency that he wrote (Ref. 18, p. 66): 


I know of scarcely anything so apt to impress the imagination as the 
wonderful form of cosmic order expressed by the “Law of Frequency 
of Error." The law would have been personified by the Greeks and 
deified, if they had known it. It reigns with serenity and in complete 
self-effacement amidst the wildest confusion. The huger the mob and 
the greater the apparent anarchy, the more perfect is its sway. It is the 
supreme law of Unreason. Whenever a large sample of chaotic ele- 
ments are taken in hand and marshalled in the order of their magni- 
tude, an unsuspected and most beautiful form of regularity proves to 


have been latent all along. 


Both Quetelet and Galton believed that most physical and mental vari- 
ables, when reliably and appropriately measured, would be found dis- 
tributed according to the “normal curve of error," or approximately so. 


THE NORMAL DISTRIBUTION 


The normal frequency distribution, whose smoothed polygon is the 
so-called normal curve, is the backbone of statistical theory, and we 
shall consider its elementary theoretical and practical applications at 
some length in later chapters. In this section, we wish only to note 
some of its characteristics as a type of distribution that empirical data 
frequently tend to follow. Since such data are rarely, if ever, entirely 
normal in form, the normal curve is a mathematical ideal. 

Let us examine the shape of the frequency polygon of a normal 
distribution. The 400 normally distributed scores of Table B, Appendix, 
are grouped in Table 2.4. The histogram and polygon of the normal 
distribution of Table 2.4 are shown in Figure 2.5. 

If a smooth curve were sketched to fit the polygon in Figure 2.5 as 
closely as possible, and to approach but not touch the base line, the 
curve would closely resemble the normal curve. The important things to 


36 / Statistics in Education and Psychology 


TABLE 2.4 


Normal Distribution of 400 Scores 
(From Table B, Appendix) 


SCORE FREQUENCY 
68-72 1 
63-67 4 
58-62 1 
53-57 26 
48-52 49 
43-47 69 
38-42 80 
33-37 69 
28-32 49 
23-27 26 
18-22 11 
13-17 4 
8-12 1 


note about the shape of the curve are its symmetry and its degree of 
peakedness, both of which are quite similar to those of an outline of a 
typical bell. 

As was noted above, empirical data rarely, if ever, yield a truly 
normal distribution. They tend to show systematic departure from 
normal form either in respect to symmetry or to peakedness, or to both, 
although in many instances the departure can be considered the result 


80 


Frequency 
888 


° 
ô 


7.5 17.5 275 325 475 575 
125 225 325 425 525 625 725 
Score 


Fig. 2.5. Histog 


ram and frequency polygon of a normal distribution. (From 
Table 2.4.) 


Organization and Presentation of Statistical Data / 37 


12 


Frequency 
o 


— n 
124.5 144.5 
74.5 94.5 114.5 134.5 154.5 


Score 


64.5 84.5 104.5 


Fig. 2.6. A negatively skewed distribution. (Scores of 47 subjects on a 
generalization test.) 


of sampling fluctuations and not, therefore, necessarily in contradiction 
to an assumption of a normally distributed population. 


SKEWED DISTRIBUTIONS 

The word skewed means “lacking symmetry" or “distorted.” The 
meaning of skewness as applied to frequency distributions can be most 
clearly brought out by illustrations. 

The frequency polygon of Figure 2.6 is particularly lacking in 
symmetry on the left side and is considered to have negative skewness. 
The polygon of Figure 2.7 shows positive skewness. Distributions 
showing a systematic departure from normal curve symmetry are said 


14 
12 
10 
n 
5 8 
9 6 
E 
4 
2 
° 7 12 17 22 27 32 37 42 47 52 


Seconds 


Fig. 2.7. A positively skewed distribution. (A subjects speed of response to 


58 mental test items.) 


38 / Statistics in Education and Psychology 


to be skewed. In such distributions, the variation of the measures is 
considerably greater near one end of the scale than the other. 


LEPTOKURTIC AND PLATYKURTIC DISTRIBUTIONS 


The word kurtosis refers to the relative “width of shoulders" or 
"degree of peakedness" of a frequency distribution, that of the normal 
distribution being described as mesokurtic (mesos means “middle” or 
"medium"). Relatively high and narrow distributions are described as 
leptokurtic; relatively flat-topped distributions as platykurtic. The two 
types of distributions are illustrated in Figures 2.8 and 2.9. Either type, 
of course, may show skewness as well as nonnormal peakedness. 

Since the apparent kurtosis of a frequency polygon is affected by 
the choice of the dimensions used in its construction, the departure of a 
given frequency distribution from normality with respect to peakedness 
is more difficult to detect by inspection than is skewness. In later chap- 
ters we shall consider quantitative methods of describing skewness and 
kurtosis and methods of determining whether the departure from 
normality in a given distribution is too great to be reasonably ascribed 
to sampling fluctuations. 


OTHER FORMS OF DISTRIBUTIONS 


In the above paragraphs, we have considered only those distribu- 


tions whose polygons have single peaks with sides sloping downward 
from the peaks. 


35 


Frequency 
- I^] 
Ë 8808 


10 


0 a A a 1 1 — > n 
17.45 19.45 21.45 23.45 25.45 27.45 
18.45 20.45 22.45 24.45 26.45 


Length 


Fig. 2.8. A leptokurtic distribution. (Length in centimeters of 100 books 
selected at random from a library shelf.) 


Organization and Presentation of Statistical Data / 39 


Frequency 


n ME a e" 
05 45 85 125 165 205 245 28.5 
25 65 105 145 185 225 265 
Score 
Fig. 2.9. A platykurtic distribution. (Scores of 293 subjects on a problem- 


solving test.) 


Some empirical data show two or more peaks, some suggest J-type 
curves, some U-type curves, and some tend to show little or no regu- 
larity. In research work the type of distribution in the sampled popula- 
tion is usually of concern. Many of the statistical procedures taken up 
in the following pages assume population normality and may give mis- 
leading results if the assumption is untenable. 

At this point in his study of statistics, the student is encouraged to 
develop an attitude of thoughtful skepticism regarding easy and hasty 
assumptions of normality. At the same time, he should note the rather 
remarkable frequency with which distributions of sample data suggest 
that the population distribution is more or less normal. In later pages 
we shall have more to say about assumptions of normality, and shall 
suggest several special methods of dealing with data from markedly 


nonnormal populations. 
EXERCISES 

es Table A, Appendix, in the classes 

In a parallel frequency column, tally 
the next 46, and in a third column, the last 46. Now tally the first 46 
in the classes 300-399, 400-499, 500-599, 600-699, and 700-799; the 
next 46 in a parallel frequency column; and the last 46 in a third fre- 
quency column. Comment on the comprehensiveness, informativeness, 
and stability of the two grouping schemes. 

2. The open-end distribution is one in which the lowest or highest class 
interval has no specified range—e.8.. in a distribution of IQ's the highest 


1. Tally the first 46 of the VAT scor 
330-344, 345-359, ..., 780-794. 


40 / Statistics in Education and Psychology 


class may be indicated **140 and above." What sort of data necessitate 
the open-end distribution ? 

3. Suggest appropriate grouping schemes for the chronological ages, 
Regents' language scores, and Regents' averages of Table A, Appendix. 
To prevent confusion in tabulation, how would the classes for the 
Regents' averages have to be indicated ? 

4. Each row of the following chart refers to a grouping scheme for a set 
of data. Fill in the blanks. 


CONSECUTIVE 


SIZE OF SCORES 
CLASS INCLUDED IN REAL EXPRESSED 
INTERVAL MIDPOINT INTERVAL LIMITS LIMITS 
2 88.5 
3 66 64.5-67.5 
s 48-52 
7 T 
10 90-99 
5 3.7 
04 2.20-2.23 


5. Explain the statement, “The “typical” frequency distribution of test 
scores supports the idea that the more extreme a score is, the less fre- 
quently it tends to occur.” 

. Show that the area of the “closed” frequency polygon is equal to the 
area of a histogram. 

7. The distributions of the AT and MAT scores, Table A, Appendix, are 

Shown below. Construct frequency polygons of both on a common 
scale of scores. Construct cumulative frequency curves on a common 


Scale. Does either graph emphasize features of the distributions not 
emphasized by the other? Explain. 


SCORE VAT MAT 
780-809 1 

750-779 2 1 
720-749 3 2 
690-719 6 3 
660-689 7 8 
630-659 12 12 
600-629 8 13 
570-599 15 17 
540-569 14 18 
510-539 16 18 
480-509 24 17 
450-479 14 16 
420-449 7 10 
390-419 6 2 
360-389 1 1 
330-359 2 


TOTAL 138 138 


10. 


UL; 


12, 


Organization and Presentation of Statistical Data / 41 


What characteristic of a frequency distribution accounts for an f^ - 
shaped cumulative frequency curve? Find or invent a distribution whose 
cumulative frequency curve is not f -shaped. 

If the scores in the classes of a frequency distribution neither fall at the 
class midpoints nor scatter evenly over the class intervals, errors of 


grouping are present. 


a. Why would errors of grouping tend to be compensating over the 
entire distribution ? 
b. Can you think of a situation where errors of grouping would not be 


compensating ? 
c. As the grouping interval is made smaller, what is the effect on errors 


of grouping? 
What type of distribution would be likely to characterize age of people 


at time of marriage? 

Suppose a 50-item test of low difficulty were administered to a group of 
100 students. What type of distribution most likely would characterize 
the scores? Suppose the items were of high difficulty? 

Suppose the amount of pupil or worker tardiness in a school or factory 
were recorded by noting the number of tardies and the minutes of 
each, If the number of tardies were plotted on a vertical scale and 
minutes on the horizontal, what do you believe would be the shape of 


the distribution ? 


CHAPTER III 


Characteristics of Statistical 


Series. Central Tendency 


We have seen that the first step in the reduction and description 
of a long quantitative series is that of classifying the values in a fre- 
quency distribution and constructing frequency diagrams, such as the 
histogram and the polygon. 

It is usually the case that statistical work involves a comparison 
of one series with one or more others or with theoretical values, such 
as standardized test norms. For example, when we have in hand such 
Series as those shown in Table 3.1, we ordinarily would wish to know 
how the schools compare as to performance on the educational test, 
or how the entire sample of 293 eighth-grade pupils compares with 
eighth-grade pupils of previous years in the same city or with pupils 
in other cities, or with eighth-grade pupils at large, as given by state 
or regional norms for the test. The questions we attempt to answer by 
Statistical methods usually involve comparisons of two or more series, 

When the series to be compared are classified in frequency dis- 
tributions or are graphically depicted in histograms or frequency curves, 
points of similarity and difference may be noted roughly by inspection. 
If we inspect the distributions of Table 3.1 and the polygons of Figure 
3.1, we note that the distributions of Schools E and F seem to represent 


42 


Characteristics of Statistical Series. Central Tendency / 43 


TABLE 3.1 


Distributions of Arithmetic Fundamentals Test Scores 
in Samples of Eighth-Grade Pupils in Ten Schools 


SCHOOL ALL 

SCORE [A B C D E F G H I J scoos 
51-53 1 1 
48-50 pj «3 1 3 
45-47 1 0 3 tD 5 
42-44 0 4 & 5 Ó Q 1 19 
39-41 | 1 Ot 4 7 i 2 4 i 18 
36-38 | 2 £ B E Q 35.2 2. 1 31 
axas 9 a F $5 & 3 2. Ll 1 4 37 
3035 | 4 S 4 p 2 4 4 8 4 8 49 
2229 | d 7 & 6 3 à 4 5 2 A 4l 
24426| 2 0 5 8 1 1 3 3 4 4 31 
213 | 1 2 & 3 oO j 2 L O 3 19 
18-20 0 4 3 1 b 2 2 2 i 16 
15-17 i 2 4 4 1 12 
12-14 L 0 2 1 4 
9-11 2 1 0 3 
6-8 0 1 1 
3-5 3 3 
NUMBER | 23 18 38 37 35 35 32 29 17 29 293 


the best performances on the test and the distribution of School G 
perhaps the poorest. But we further note both that there is consider- 
able overlap of the distributions and that each distribution has several 
unique features. Comparisons by inspection tend to be inexact and 
inconclusive, and it is difficult to obtain agreement regarding their 
meaning. As a rule, quantitative methods of comparing frequency 
distributions are more satisfactory than graphical methods. 

Quantitative series may differ in one or more of four important 
respects: (1) average value of the items, or central tendency; (2) the 
scatter of the items about the average value, or variability; (3) degree 
of asymmetry in the scatter of the items, or skewness; and (4) the extent 
to which the items are concentrated in the neighborhood of the average 
value, or kurtosis. In practical work, two or more comparable series 
ordinarily differ to some extent in all four respects, if only because of 
Sampling fluctuations 

In order to compare two or more series exactly, we need measures 
of their four major characteristics. There are situations in which vari- 
ability, skewness, and kurtosis, particularly if the series are short, are 


44 / Statistics in Education and Psychology 
0.39 r 
036r 
0.33 
0.30 | 

027F 

0.24 

0.21 

0.18} 

0.15 

0.12f 

0.09 - 

0.06 | `À 

0.03 / N 


Z Y 
du NE LV — S i. ug SY 
9 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 
Score 
Fig. 3.1. Frequency polygons of distributions AO, B(----), E(— —) and 


G(—- —) from Table 3.1. (Class frequencies are expressed as relative fre- 
quencies or proportions.) 


Relative frequency 


not considered; and, at the other extreme, situations in which significant 
detail would be lost if we attempted to describe series only in terms. of 
their four major characteristics. In the former situations, the implicit 
assumption is that the variability, skewness, and kurtosis of the series 
are not so different as to invalidate the comparisons; in the latter, it 
may be doubted whether exact comparisons are desirable. Ordinarily, 
we may consider measures of central tendency, variability, skewness, 
and kurtosis necessary and sufficient in describing and comparing quan- 
titative series. These measures, in addition to providing the basis for 


exact comparisons of two series, are indispensable in the analysis and 
interpretation of a single series. 


MEASURES OF CENTRAL TENDENCY 


It was seen in the previous chapter that the items in a quantitative 
series, classified in a frequency distribution, tend to cluster about a 
point somewhere between the extremes, and the tendency again is seen 
in the table and figure above. This tendency commonly is referred to 
as the central tendency of the series, and the point about which the 
items tend to cluster is called a measure of central tendency. 


Characteristics of Statistical Series. Central Tendency / 45 


A measure of central tendency is a sort of average or typical value 
of the items in the series, and its function is to summarize the series in 
terms of this average value. It is a denominate quantity, being ex- 
pressed in the same unit as the items. 

Since the central tendency of observational data tends to be rela- 
tively stable from sample to sample taken from the same population, 
a measure of central tendency has more than summarizing and descrip- 
tive usefulness. As will be seen in Chapters X and XI, measures of 
central tendency are frequently used in determining whether two sam- 
ples differ so greatly as to discredit the hypothesis that their parent 
populations are alike in average value. 

There are three measures of central tendency in common use: 
mode, median, and arithmetic mean. Since each of these measures, or 
"averages," involves somewhat different techniques and interpretation, 
each will be treated separately in the following pages. As he learns 
about each, the student should keep in mind that no single number 
can adequately describe a statistical series, and that consequently two 
or more series cannot be fairly compared on the basis of average values 
alone. In the past, educational workers in particular have tended to be 
preoccupied with average values to the exclusion of the other important 


features of their data. 


The Mode 

The mode may be defined as the item which occurs most fre- 
quently in a statistical series. When a particular type of wearing ap- 
parel, such as a suit or dress, is worn more frequently than other types 
during the fall season, that particular type of apparel is referred to as 
the mode for the fall season. If the majority of college graduates enter 
professional careers, the modal occupation of college graduates is pro- 
fessional. If we have the ten scores, 24, 22, 20, 30, 29, 22, 26, 28, 22, 25, 
the modal score is 22, since 22 is the score appearing most often. In the 
series 20, 21, 24, 25, there obviously is no modal score. The abbrevia- 
tion commonly used for the mode is Mo. 
THE MODE IN A FREQUENCY DISTRIBUTION 


The mode* in a frequency distribution is considered to be at the 
midpoint of the class interval containing the greatest number of cases. 


* The mode as defined here is sometimes called the crude or empirical mode to 
distinguish it from the mathematical mode. In statistical theory, the latter is defined 
as the abscissa corresponding to the highest point of a theoretical frequency curve. 


46 / Statistics in Education and Psychology 


In Table 3.1, the mode for the distribution of scores in School D is 25, 
the midpoint of the interval 24-26, which contains the greatest number 
of scores. The mode for the School F distribution is 43. The School E 
distribution does not have a single mode, but two modal values, one 
at 37, the other at 43. We shall consider distributions having no sin- 
gle mode under the next heading. The “All-Schools” distribution in 
Table 3.1 has a marked mode at 31. 


USES OF THE MODE 


The mode, which indicates the item of greatest frequency in a 
series, is an average widely used in everyday life. When newspapers 
use the term “on the average” they are usually referring to an out- 
standing or conspicuous tendency, and not to an arithmetic average. 
The mode is an average that is easy to understand, easy to determine, 
and one that best depicts the typical size of the items in a series. The 
fact that the mode can be determined by inspection favors its use as a 
rough index of the central tendency of the frequency distribution. 

The mode is not affected by extreme scores, and we do not need 
to know the extreme scores in a series to determine the modal score. 
For example, if the modal salary of college presidents is desired, we 
need not know the highest and lowest salaries paid to college presi- 
dents to determine the modal salary for the group. 

The mode of a distribution, more than other averages, is affected 
by changes in the grouping scheme, and is subject to wide fluctuations 
from one sample to the next. It is not at all reliable in small samples. 
The mode suffers the additional disadvantage of being incapable of 
algebraic treatment. For example, the mode of one distribution cannot 
be combined with the mode of another to determine the mode of the 
combined distributions. When the number of cases in a distribution is 
small, or in situations calling for fine grouping, the mode may have 
little meaning. If the modal salary of 50 workers is $5,000, for example, 
but if 45 of the 50 receive salaries different from $5,000, it would be 
absurd to report a mode. 


The concept of mode as a measure of central tendency is chiefly 
useful because it invites attention to bimodal data. The presence of 
two distinct modes in a distribution casts doubt on the homogeneity of 
the data. As we have seen, the basis for quantitative cl. 


Ó Ç assification is size, 
the data being considered alike in kind or quality. The presence of two 


modes suggests that a qualitative difference may underlie the data and 
that more exact and useful analyses may be possible if this difference is 


Characteristics of Statistical Series. Central Tendency / 47 


taken into account. Although it is not always possible to locate the 
reasons for bimodality, careful analysis of data in which two modes 
appear should ordinarily be made. (See Ex. 2 and Ex. 3.) Whether or 
not the reasons can be located, both modes should be noted. It is not 
good practice to report a single average where two distinct modes exist. 


The Median 


A second measure of the central tendency of a statistical series is 
known as the median. By definition, the median is that point on the 
scale of scores below which one half of the scores lie and above which 
one half of the scores lie. Hence, the median, by virtue of its middle 
position, characterizes the central tendency of a series. 

When the scores are ungrouped, the median is defined as the middle 
score. If, for example, we have the five scores 8, 12, 15, 16, 19 arranged 
in order of size, the mid-measure is 15. If we have an even number of 
ungrouped scores arranged in order of size, the mid-measure is cus- 
tomarily defined as the point halfway between the two middle scores. 
Thus, in the series 6, 8, 10, 13, 14, 16, the mid-measure is (10 + 13)/2 
or 11.5. 

COMPUTATION OF MEDIAN OF GROUPED DATA 


A graphical illustration will serve to clarify the computation of the 
median of grouped data. As was seen in Chapter II, the area of the 
histogram is proportional to the number of scores in a distribution. 
Since the median by definition is the point on the scale of scores below 
(or above) which one half of the scores lie, a vertical line drawn through 
the median will bisect the histogram, and conversely, the vertical line 
which bisects the histogram will pass through the median. With this in 
mind let us examine the histogram of Figure 3.2. 

The line which bisects the histogram of 37 scores must pass through 
a point on the scale below which 18.5 scores lie. Since 14 scores lie 
below the 26.5-29.5 class, the line will have to mark off an area equiva- 
lent to 4.5 scores in that class in order to mark off 18.5 scores in all. 
If we assume that the 6 scores in the 26.5-29.5 class are distributed 
evenly over the interval (a necessary assumption in computing the 
median), the location of the line is easily determined. The 6 scores are 
distributed over an interval of 3 units (from 26.5-29.5). Hence, each 
unit and the needed 4.5 scores correspond to 
When we add 2.25 to 26.5, we obtain 28.75 
slie. Hence, 28.75 is 


Score corresponds to 3/6 
4.5 X 3/6 or 2.25 units. ° 
as the value of the point below which 18.5 score 


48 / Statistics in Education and Psychology 


p Median = 28.75 


Frequency 
> on 
- 


w 


I 
I 
I 
l 
l 
I 
I 
I 
[ 
I 
I 
I 
[ 
1 


1 
| 
1H 
1} Ñ 
3/3|8]|43|6|53]|1,|1 1 
0 it == 


17.5 20.5 235 26.5 29.5 32.5 35.5 38.5 41.5 445 47.5 50.5 
Scale of scores 


N 
= 


nia 


Fig. 3.2. The median in a histogram. (School D, Table 3.1.) 


the median, or the point below which one half of the scores in the 
distribution lie. 


It is not necessary, of course, to construct a histogram in order to 
determine the median of a frequency distribution. We need only to 


a. Determine the number of scores or cases below the class in which 
the median falls. 


. Subtract this number from N/2. 


c. Divide the difference by the number of cases in the class containing 
the median. 


. Multiply the quotient by the class interval. 


- Add the product to the lower real limit of the class in which the 
median falls. 


o 


"ao 


This is exactly the procedure which was followed in computing the 


median of the distribution represented by the histogram in Figure 3.2. 
The procedure may be summarized in the formula 


N/2 — 

Mdn = L + pe» E (3.1) 
f 

in which the abbreviation Mdn indicates the median; L is the lower 

real limit of the class containing the median; N is, as always, the total 

number of cases in the distribution; F is the total number of cases 


Characteristics of Statistical Series. Central Tendency / 49 


TABLE 3.2 


Computation of Median of Grouped Data 
by Formula 
(Data from Table 3.1) 


SCORE y cf COMPUTATION OF MEDIAN, USING FORMULA (3.1) 
51-53 1 293 
48-50 3 22 N=293 N/2 = 46.5 
q E Since median falls in 30-32 class, 
39-41 18 265 L=29 
36-38 31 247 S 
33-35 37 216 F=130 f=4 i23 
30-32 49 179 idis «t 
27-29 4| 130 N" ( S = ) 
24-26 31 89 Mdn = 29.5 + 49 B 
21-23 19 58 
18-20 16 39 = 29.5 + 1.01 
15-17 12 23 
12-14 4 11 = 30.51 
9-11 3 7 
6-8 1 4 
3-5 3 3 

N = 293 


in the classes below the class containing the median; f is the number 
of cases in the class containing the median; and is the class interval. 

In finding the median when N is large, it is helpful first to set up a 
cumulative frequency (cf) column, as illustrated in Table 3.2, at least 
up to the class in which the median falls. If the column is completed, 
a check, cf — N, on addition throughout is provided. 

The application of the formula for the median is illustrated in 
Table 3.2. The student is advised to study the graphical illustration of 
computing the median and to rely upon understanding rather than upon 
the formula in computing medians. 

Occasionally the student will come across a frequency distribution 
in which the median falls exactly on a real class limit. If an adjacent 
class contains zero frequency, as illustrated below, 

SCORE 


45-49 
40-44 
35-39 
30-34 
25-29 
20-24 
15-19 


NUAODUN a 


50 / Statistics in Education and Psychology 


the median is usually considered to be the midpoint of the class interval 
having zero frequency. In the illustration the median is considered to 
be 32.0 rather than 29.5. If two adjacent classes have zero frequency, 
the median may be considered to be the real limit between the two 
zero-frequency classes, although in the latter situation, at least, there 
would be present presumptive evidence of bimodality, and consequently 
doubt regarding the fairness of a single measure of central tendency. 

The median of a frequency distribution of discrete data is computed 
exactly as the median of a frequency distribution of continuous data. 


USES OF THE MEDIAN 


The median of a distribution is the point below (or above) which 
one half of the values lie. Since the median of the distribution in 
Table 3.2 is 30.51, we know that the number of pupils having scores 
below 30.51 is equal to the number of pupils having scores above that 
point. A score below 30.51 is “below average" in the sense of being 
one of the scores in the lower half of the distribution. 

In general, the median is easily understood and has several ad- 
vantages as an average. When a series contains either a few extremely 
high or a few extremely low scores relative to the majority of scores in 
the series, the median is perhaps the most representative average avail- 
able, for it is not affected by extreme scores. When averages of such 
data as salaries, costs of homes, days lost by workers because of illness, 
and ages of people at time of marriage are needed, the median generally 
is to be preferred. When the central tendency of an open-end distribu- 
tion, i.e., a distribution having a bottom or top interval of unspecified 
length, is desired, the median is the most reliable measure that can be 
computed, 

The median, like the mode, is a nonalgebraic measure, and medians 
of separate distributions cannot be combined to give the median of the 
combined distribution. It has the further disadvantage of being less de- 


pendable than the arithmetic mean, a point which will be discussed in 
the next section. 


The Arithmetic Mean 


In many cases when we wish to find the average of a set of scores, 
we simply divide the sum of the scores by the number of scores in the 
set. The result is popularly called the “average”; however, in statistics 
it is designated arithmetic mean in order to distinguish it from other 


Characteristics of Statistical Series. Central Tendency / 51 


averages. In discourse, the term arithmetic mean usually is shortened 
to mean. 

The arithmetic mean is defined as the sum of the values in a series 
divided by the number. Using X;, X», X3,..., Xn to represent the 
values of the respective N items in a series, the definition may be written 


M X, + Xo + Xa cod An, 
N 


The definition may be stated more simply 


EX 
M= ` (3.2) 


in which 3.X* is the sum and N the number of the items in the series. 
The symbol > always refers to sum in statistics. 

The mean of a short series is easily found. All that we have to do 
is sum the scores and divide by the number, N. However, this procedure 
has two limitations. First, it tends to be laborious when the series is 
long, unless a desk calculating machine is available. Second, it cannot 
be applied to a frequency distribution. Hence, we need to consider 
methods of finding the mean of grouped data—methods that will 
ordinarily reduce labor when the series is long. 


THE MEAN OF GROUPED DATA 


When data are grouped in a frequency distribution, we may use 
either of two methods of finding the mean. Look at the distribution 
in Table 3.3 of the scholastic aptitude test scores (SAT) of 138 college 
freshmen. The midpoints of the class intervals are designated X". Since 
the scores have lost their identity, we must assume that they are dis- 
tributed evenly in the intervals or, what amounts to the same thing, that 


*The items in a quantitative series commonly are designated X, it being under- 
stood that the numerical value of X may vary from item to item. The expression 
2X indicates summation or addition of all of the items. The symbols M and X 
are used interchangeably to indicate the arithmetic mean, If the more precise nota- 
tion X; is used, the formula for the mean is written 


N 
S225 
X. = i=l , 


N 


Which states explicitly that all of the items in the series are summed. 


52 / Statistics in Education and Psychology 


TABLE 3.3 


Computation of the Mean of Grouped Data 
(Data from Table A, Appendix) 


SAT sconE X MID-VALUE OF f rx 
CLASS X" 

1450-1499 1474.5 1 1474.5 
1400-1449 1424.5 s 4273.5 
1350-1399 1374.5 4 5498.0 
1300-1349 1324.5 5 6622.5 
1250-1299 1274.5 6 7647.0 
1200-1249 1224.5 14 17143.0 
1150-1199 1174.5 16 18792.0 
1100-1149 1124.5 14 15743.0 
1050-1099 1074.5 24 25788.0 
1000-1049 1024.5 22 22539.0 
950-999 974.5 12 11694.0 
900-949 924.5 11 10169.5 
850-899 874.5 3 2623.5 
800-849 824.5 1 824.5 
750-799 774.5 2 1549.0 

SUM 138 152381.0 

152381.0 
zb xm 1104.2 


they have the mid-values of their respective classes. Under this assump- 
tion, we have one score of 1474.5, three scores of 1424.5, and so on. To 
find the mean, we multiply the mid-values of the classes by the fre- 
quencies, sum the products, and divide by 138, as shown in the table. 
This procedure may be summarized in the formula, analogous to (3.2), 


_ zfXY 
= “> (3.3) 


in which f is the frequency in a class and X’ is the class midpoint. 

The mean of the distribution of Table 3.3 is 1104.2. The student 
can verify that the mean of the 138 ungrouped SAT scores in Table A, 
Appendix, is 1103.9. Here, as is usually the case, the mean of a dis- 
tribution differs slightly from the mean of the scores before grouping. 
The discrepancy is due to errors of grouping, i.e., the failure of the 
scores to be distributed evenly in the classes or to have, on the average, 
the mid-values of the classes. However, the errors tend to be compen- 
sating, and ordinarily the mean of a distribution is very close to the 
mean of the ungrouped scores. 


M 


Tati 


Characteristics of Statistical Series. Central Tendency / 53 


A second and somewhat easier method of finding the mean of 
grouped data is systematized in Table 3.4. The method consists essen- 
tially of selecting the midpoint of a class as a reference point or arbi- 
trary origin and expressing the deviations of the midpoints of the other 
classes from that origin in unit steps, d, as illustrated in the table. 


TABLE 3.4 


Computation of the Mean of Grouped Data 
(Data from Table 3.3) 


SAT score X f d fa 
1450-1499 1 8 8 
1400-1449 3 7 21 
1350-1399 4 6 24 
1300-1349 5 5 25 
1250-1299 6 4 24 
1200-1249 14 3 42 
1150-1199 16 2 32 
1100-1149 14 1 14 
1050-1099 24 0 0 
1000-1049 22 -1 —22 

950-999 12 -2 —24 
900-949 11 —3 —33 
850-899 a —4 —12 
800-849 1 -5 -5 
750-799 2 —6 -12 

SUM 138 82 


Computation of mean by formula (3.4): 
AO = 1074.5, Zfd = 82,7 = 50, N = 138 
M = 1074.5 + (82/138)50 = 1104.2 


Again we must assume that the scores in a class have the mid-value 
of that class. When the class midpoints are equispaced, any midpoint 


may be expressed as 
Xr = AO + di, 


where AO is the arbitrary origin, d the deviation of the class from AO, 
and ; the class interval. In Table 3.4, for example, in which 1074.5 is 
the arbitrary origin, the midpoint of the top class, 1474.5, may be ex- 
pressed as 1074.5 + 8(50); the midpoint of the next to lowest class, 
824.5, may be expressed as 1074.5 + (—5)(50); and so on. The student 
can satisfy himself that the relationship X’ = AO + di holds through- 


out the table. 


54 / Statistics in Education and Psychology 


When we substitute this expression for X’ in formula (3.3), we have 


X/(AO + di). 


M- Ç 


from which, since the constant AO is summed X times and divided by 
N, we obtain 

M = AO + ES) i. (3.4) 
The application of formula (3.4) is illustrated in the space below Table 
3.4. Notice that the mean is in exact agreement with the mean com- 
puted in Table 3.3. The two methods necessarily give identical results. 
The latter method is sometimes called the coded method, since the d 
units really are coded values of the midpoints of the class intervals. 
When the class intervals of a frequency distribution are equal, the mean 
of the distribution may be found by the coded method with a minimum 
of work and a minimum of danger of computational error. Let us 
summarize the procedure in the following set of directions: 


a. After a frequency distribution having equal class intervals is set up, 
select an arbitrary origin at the midpoint of a class interval and code 
this class “0” in a d column. 

b. Code the next higher class “1” in the d column, the second higher 
“2,” and so on. Code the next lower class "—1," the second lower 
"—2," and so on. Be sure that the d values are increasing in the 
same direction as the class midpoints. 

c. Multiply the d values by their respective frequencies f and enter the 
products in an fd column. 

d. Find 3/4, the algebraic sum of the fd column, divide 2/d by N, and 
multiply the quotient by i, the class interval. 

€. Add the result of step (d) to the midpoint of the class interval coded 
"0" to obtain the mean of the distribution. 


USES OF THE ARITHMETIC MEAN 


The arithmetic mean is the most widely used measure of central 
tendency. Although usually somewhat more difficult to compute than 
the mode or median, its definition and meaning are easily understood. 
The mean perhaps best conveys the idea of average value, since it is 
derived from the exact values of the items in the series. 

The fact that the mean is based upon the sum of the values in a 
series enhances its usefulness in some situations. If we have a set of 
independent observations of the same thing—e. £., the ratings of several 


`_*TrmF YFA| sÇ +z* *wÜiari⁄ Y 


Characteristics of Statistical Series. Central Tendency / 55 


judges of an individual or a set of measurements of a dimension or 
property of an object—the mean is extremely useful. When it can be 
shown, as is often the case, that the errors in a set of observations tend 
to be compensating, the mean of the observations is relatively free from 
error. 

But in other situations the fact that the mean is affected by the 
value of each item works against its fairness as a measure of central 
tendency. As was previously noted, when a series includes a few items 
of either high or low values relative to the values of the majority, the 
mean is not a fair measure of central tendency. If, for example, the 
yearly incomes of six lawyers in a small town were $25,000, $6,000, 
$6,000, $5,000, $5,000 and $4,000, we would not ordinarily be satisfied 
with the mean $8,500 as representative of the average salary of lawyers 
in the town. To liken the prospects of practicing law in that town with 
the prospects of practicing law in a town in which all lawyers earned 
between $8,000 and $9,000 yearly would be grossly misleading. 

The mean is applicable to series of any length, from two values 
upward, It is an algebraic function of the values, and this property 
adds enormously to its usefulness in statistical work. If we have two 
or more subgroups, the mean of the total group can readily be de- 
termined from the numbers and means of the subgroups. In Exercise 11, 


the student is asked to prove the formula 


_ N Mi + N2Mo, 3.5 
Mpa Ni + No G5) 


in which M, is the mean of the total group, Ny and M are the number 
and mean, respectively, of the values of one group, and N> and M; 
are those of the other. The formula can easily be extended to more 
than two series. The mode and median of combined series cannot be 
determined from those of the separate series. 

The mean can be used to obtain an average value of a series after 
each item is weighted. For example, suppose we wish to combine the 
three test scores 60, 72, and 85 of a student into a composite score, and 
Suppose we wish the last score to count four times as much as the 
first and the second to count twice as much as the first. The weighted 
mean is 

(60 X 1) + (72 X 2 + (85 X 4) _ 713. 
Munt 14244 


The unweighted mean of the scores is (60 + 72 + 85)/3 or 72.3. 


56 / Statistics in Education and Psychology 


There are a great many situations in which a weighted arithmetic mean 
may be useful. Whenever a composite or "average" must be derived 
from values of unequal importance or of unequal reliability, the 
weighted mean is appropriate, provided a system of weights can be 
worked out and justified. 

The mean has other important properties. The sum of deviations 
of the scores in a series from their mean is always zero. This fact is 
frequently useful in checking or simplifying statistical procedures. It 
is also true that the sum of squared deviations is always less when 
deviations are taken from the mean than from some value other than 
the mean. Arithmetically, if we have the scores 4, 9, 12, and 15, their 
mean is 10, and the deviations from the mean are —6, —1, 2, and 5. 
These sum to zero. The sum of squares of the deviations is (—6)? + 
(=D? + (2)? + (5)? or 66, and this is less than the sum of squares 
of deviations from some value other than 10. 

In estimating from a sample of scores the point or location of the 
central tendency of a population, the mean tends to give a more accurate 
estimate than the mode or median. (See Ex. 10.) In statistical theory, 
it can be shown that the means of random samples drawn from a normal 
population differ less among themselves and less from the actual point 
of central tendency of the population than do other averages. We 
return to this important fact in a later chapter. 

All in all, the mean is an excellent average and, as a rule, it is the 
average to use unless there is special reason for not using it. 


Interpretation and Use of Measures of Central T. endency 


When the student first encounters measures of central tendency, 
he may become so engrossed in their calculation that he loses sight of 
their meaning. In the most general sense, the calculation of a measure 
of central tendency is a process of reducing a statistical series to a 
single, summarizing figure. The process is necessary in comparing and 
describing series for the simple reason that the mind cannot grasp the 
meaning of a series in all of its details. 

The reduction of a series to an average value is not without danger 
of distorting information. Variability is an important feature of a 
statistical series. An average value conceals this feature, and a compari- 
son of average values may be unfair and misleading if the series are dis- 
similar in variability. An average does not have meaning independent 
of the other characteristics of a statistical series; in fact, if a series is 


Characteristics of Statistical Series. Central Tendency / 57 


highly variable or irregular and rich in detail, an average may have no 
real meaning and serve no useful purpose at all. 


APPROPRIATE USES OF AVERAGES 


The question of which average to use in summarizing a given series 
is an important question, but one which permits no thumb-rule answers. 
A question that antecedes “which average" is whether any average 


will facilitate useful analysis and comparison. 
Assuming that a given series is amenable to reduction to an average 


value of some sort, the selection of a particular average involves the 
considerations that have been dealt with in previous sections of this 
chapter. In summary form these are: 


The arithmetic mean is the most widely used and useful measure of 
central tendency. It is the most reliable measure, as a rule, and is simply 
and clearly defined. It perhaps best expresses the idea of an average 
value. Being an algebraic quantity, the mean is tractable in mathematical 
analysis. The most precise measures of variability and relationship (to 
be described later) involve the mean. It is generally advisable to use the 
mean as the measure of central tendency, unless there is special reason 
for not using it. 

The median is particularly useful in four situations. First, if a series 
contains a few extreme or exceptional values, the median generally 
gives a fairer impression of the average value of the series than the mean. 
It is usually the case that when the median of a distribution is markedly 
different from the arithmetic mean, the former is the better average. 
Second, if there is doubt regarding the nature of the unit of measure- 
ment, the summation of a set of scores may be unsound. In this situa- 
tion, the median as a point below (or above) which one half of the scores 
lies is perhaps the most accurate statement of central tendency which can 
be made. Third, if a distribution has an upper or lower class interval of 
unspecified length, the median is the most reliable measure which can be 
Obtained. Fourth, the median is a member of the percentile system, and 
hence is an appropriate average when a distribution is described and 
interpreted in terms of percentiles. (See Chap. IV.) 

The mode is appropriate when a quick approximation to the point of 
concentration or “piling up" of the items in a series is desired. It is the 
only average available, if information regarding the value of greatest 
frequency or the most typical case is needed. Except for these rather 
unusual cases, the mode has little utility as an average in applied sta- 
tistics. It is an unreliable and nonalgebraic measure. The concept of 
mode is primarily useful in analyzing and interpreting series having 
two or more points of concentration. 


No unqualified answer can be given to the question of which 
average, if any, should be used for a given series. The user can only be 


58 / Statistics in Education and Psychology 


expected to have reasons for his choice of a particular average, reasons 
which are supported by the nature of the given data, the properties of 
the average, and the issue upon which the data are expected to throw 
light. The best advice that can be given to the student at this point is to 
become thoroughly acquainted with the limitations and advantages of 
each average. 

We have discussed the three most widely used averages. There are 
several others, two of which may be useful in special situations. These 
are the geometric mean and the harmonic mean. The former is appro- 
priate in averaging data, such as growth data, in which the items tend 
to differ by a constant ratio; the latter is appropriate in averaging 
rates. (See Ref. 42, pp. 102-109.) 


CENTRAL TENDENCY OF THE QUALITATIVE SERIES 


In the qualitative series, as was noted in Chapter I, frequencies in 


categories constitute the information of statistical interest, Obviously: 


an average of frequencies, in the usual sense, would have little meaning. 
Suppose the religious preferences of a group of 525 individuals are 


PREFERENCE NUMBER 
Catholic 150 
Methodist 100 
Baptist 75 
Episcopalian 50 
Presbyterian 50 
Other 100 


Neither a median nor a mean value of the data as classified would have 
any useful meaning. The modal preference might be reported as Catho- 
lic (or, if the preferences were classified as Catholic and Protestant, as 
Protestant), but the information of concern is the breakdown by 
categories. As a rule the concept of averages is not appropriate in 
dealing with qualitative series. 

Occasionally, however, it is useful to think of relative frequencies 
or proportions as arithmetic means. Suppose 60 in a group of 100 
students pass a test item and 40 fail it. The proportion passing is 
60/100. If the passes are scored “1” and the failures “0,” 60/100 is 
the arithmetic mean of the 100 scores on the item. (We shall return 
to this matter in connection with test item analysis.) In the same sense, 
a baseball batting average may be thought of as an arithmetic mean. 

When we think about proportions as arithmetic means, we can 
readily see why two or more proportions or percentages cannot be 
combined without reference to their base numbers. Given the propor- 
tions and their respective base numbers, Pi, Ny, and p>, No, the com- 


Characteristics of Statistical Series. Central Tendency / 59 


bined proportion p, will be 


_ piNi + paÑz 
ASE (3.6) 


which is exactly analogous to formula (3.5). (See Ex. 20e.) 


THE EFFECT ON AVERAGES OF CHANGING EACH SCORE BY A 
CONSTANT 


It is frequently useful to change each score in a series by a constant. 
When the scores are large, reducing each by a constant may make them 
more manageable; when the scores involve decimals, multiplying each 
by an appropriate constant will eliminate decimals. It is left as an 
exercise for the student to show that when each score is changed by a 
constant, the mode, median, or mean is changed in exactly the same 
way. It is easy to verify that when each score is increased by 5, say, the 
averages are increased by 5 or that when each score is multiplied by 
100 the averages are 100 times what they were. 

After an average of changed scores is found, that average may 
readily be converted to the average of the original scores by reversing 
the change. 

EXERCISES 


1. Why are measures of central tendency needed? 
2. The data in the distribution below were obtained in the administration 


of a Wechsler-Bellevue arithmetic reasoning problem to 65 adults, 54 of 
whom worked the problem incorrectly. The distribution shows that 5 of 
the 54 spent from 11 to 21 seconds on the problem, 15 spent from 22 to 
32 seconds, and so on. Do the data suggest anything about the nature 
of errors or the nature of people making errors on the problem? How 


would you study the matter further? 


TIME IN SECONDS FREQUENCY 
11-21 5 
22-32 15 
33-43 5 
44-54 4 
55-65 8 
66-76 12 
71-87 5 


3. A mathematics background test was given to a large group of first-year 
statistics students. The results were definitely bimodal. What are some 
possible reasons for the bimodality? 

4. Compute the median of one or more of the distributions in Table 3.1. 

5. Using one of the distributions in Table 3.1, show that the median can be 
computed, working from the top of the distribution instead of the bottom. 


60 / Statistics in Education and Psychology 


6. 


10. 


11. 


12, 


Combine distributions 4 and B of Table 3.1 and find the median of the 
combined distributions. (a) Could this median be obtained from the 
medians of the separate distributions? (b) In what kinds of distribu- 
tions could the total median be obtained from the separate medians? 


. The distribution of the salaries of teachers and administrators in a small 


town is shown below. What is the median salary? How much is the 
median changed if the three top salaries are not included? 


SALARY FREQUENCY 
$10,000-$10,499 1 
9,500- 9,999 0 
9,000- 9,499 2 
8,500- 8,999 0 
8,000- 8,499 0 
7,500- 7,999 6 
7,000- 7,499 6 
6,500- 6,999 15 
6,000- 6,499 10 
5,500- 5,999 12 
5,000- 5,499 4 


. Compute one or more of the means of the distributions of Table 3.1. 


For which of the distributions do you believe the median is a better 
measure of central tendency than the mean? 


. Find the mean of the distribution of salaries in Exercise 7, above. How 


much is the mean changed if the three top salaries are not included ? 
Ten samples of 50 each were drawn at random from the normal popula- 
tion of scores, Table B, Appendix, in which the mode, median, and mean 
are each 40. The modes, medians, and means of the 10 samples are 
shown below. (a) On the whole, which of the three gives the best esti- 
mate of the population average? (b) Draw several random samples of 
50 each from the population and compare the modes, medians, and 
means with those below. (c) Draw several samples of 10 each and com- 
pare. What does this show about the relation of the size of the sample 
to the accuracy of estimates of population values? 


MODE: 44 42 4 3 37 42 42 52 4 4 
MEDIAN: 40.5 41.8 42.0 38.7 38.0 41.1 39.5 40.8 37.8 40.4 
MEAN: 39.7 40.6 40.8 39.1 39.5 41.2 38.6 40.7 374 40.0 


The mean as defined by formula (3.2) is M = EX/N. Can you find 
ZX if you know M and N? Use this fact to show that formula (3.5) 
gives the mean of two or more series combined. 

If the N scores of a series are represented by Xi, X», ... , Xy and the 
mean by M, the deviations of the scores from the mean are X4 — M, 
X2 — M, ..., Xy — M. Show that the mean of these deviations is 


zero. If necessary, study the matter by examining several short series, 
such as 1, 3, 5, 5, 6. 


13. 


14. 


15. 


17. 


18. 


19, 


Characteristics of Statistical Series. Central Tendency / 61 


If a constant is subtracted from each score in a series, what is the effect 
on the mean? Utilize this information in finding the mean of the scores 
91, 94, 98, 98, 99. 

An object is weighed five times on a laboratory balance, and the read- 
ings are 2.66 gm., 2.60 gm., 2.64 gm., 2.64 gm., and 2.61 gm. Under 
what conditions will the arithmetic mean of these readings be a good 
estimate of the “true” weight of the object? 

Five judges, using a 10-point scale, rated an individual on initiative as 
follows: 9, 7, 7, 6, 4. Under what conditions would the arithmetic mean 
of the ratings bea good estimate of the “true” initiative of the individual? 


. A college student completed 120 hrs. of undergraduate work with hours 


and grades distributed as shown below. Suggest à method of arriving 
at a numerical average grade for the student. In what way is any average 


misleading? 


FIELD HRS. GRADE FIELD HRS. GRADE 
Mathematics 36 A Economics 8 C 
Science 12 A Philosophy 6 € 
Science 12 B Fine Arts 6 D 
English 12 D Physical Ed. 4 D 
History 10 E Music 4 D 
Language 10 D 


Sketch unimodal negatively skewed, symmetrical, and positively skewed 
distributions and indicate the relative positions of the mode, median, and 
mean in each. 

It has been pointed out that the mean wages of men is quite different 
from the wages of average man. Which would you prefer? Why? 

In an experiment to determine whether one method of teaching logic was 
superior to a second method in increasing skill in critical thinking, two 
groups of logic students were equated. One group was taught by the 
first method, the other by the second. The gains of the individual stu- 
dents under the respective methods, as indicated by the differences be- 
tween scores on preexperiment and postexperiment tests of critical 
thinking, are shown in the distributions below. The experimenter com- 
pared mean gains of the two groups and concluded that the first method 


was superior. Do you agree? Explain. 


GAINS FIRST METHOD SECOND METHOD 
+25-+29 2 1 
+20-+24 3 7 
+15-+19 3 6 
+10-+14 6 2 
+5- +9 8 1 

0- +4 5 2 

=5- -1 2 4 
—10- —6 1 4 
—15-—11 0 3 

30 32 


62 / Statistics in Education and Psychology 


20. Criticize each of the following uses of averages: 


a. 


b. 


The mean age of men and women at time of marriage in a certain city, 
based on records for a five-year period, was reported as 26.4 yrs. 

In an endowment drive put on by a small college, 1,252 donations 
were received, ranging from $1 to $250,000 and totaling $355,000. 
The college announced an average donation of about $284. 

An instructor's class had a mean of 76 and a median of 68 on a 
standardized test. The instructor concluded that 76 was the average. 


. A chamber-of-commerce brochure advertised a Midwestern city as 


an ideal place to live, since the mean temperature was 72°. 
In samples of 90, 150, and 200 an investigator found the percentage 
of maladjusted individuals to be 60, 70, and 80, respectively. He con- 


cluded that the percentage of maladjusted individuals in the three 
samples combined was 70. 


CHAPTER IV 


Characteristics of Statistical 
Series. Variability 


The essential and all-pervasive characteristic of statistical series is 


variability. The frequency distribution owes its distinctive properties 
to the extent and manner of variation of the items it comprises; it is the 
variation of statistical data which gives meaning and usefulness to the 
concept of average value. The whole of statistics might well be char- 
acterized as the study of variability. 

In the previous chapter, we have seen that averages, such as the 
mean, although necessary in describing and comparing series, conceal 
information about variability. 

The distributions of intelligence quotients for four eighth-grade 
classes are shown in Table 4.1. The distributions are observably differ- 
ent in the way the items vary about their respective means. More than 
a quarter of the IQ's in Schools C and G lie outside the range of IQ's 
In School B; nearly a quarter of the IQ's in Schools B, C, and G are 
above the highest IQ in School J. Important information would be 
concealed if only the mean IQ's in the classes were reported. 

For further illustration, consider again the data of Table 3.1. The 
majority of the distributions in the table are characterized by differences 


63 


64 / Statistics in Education and Psychology 


TABLE 4.1 


Distributions of Intelligence Quotients in Samples 
of Eighth-Grade Pupils in Four Schools 


INTELLIGENCE 
QUOTIENTS SCHOOL B SCHOOL C scHooL G SCHOOL J 


130-134 
125-129 
120-124 
115-119 
110-114 
105-109 
100-104 
95-99 
90-94 
85-89 
80-84 
75-19 
70-74 
65-69 
60-64 
55-59 
50-54 


T) RR UO — d 09 UO T2 I2 oO 


PRWOWWH 


KOK KEK NwWEKhANHFAFNOO 
— Ot — rt 00 0 Rt 


NUMBER 18 38 32 29 
MEAN 93.94 93.32 91.22 89.41 


in amount and manner of variation of the scores as well as in central 
tendencies. A comparison of arithmetic achievement in the schools on 
the basis of average values alone, although possibly sufficient for some 
purposes, is not as complete and conclusive as the data permit. 

It is generally the case that we need to take variability into account 
in describing and comparing statistical data. The variability of a series 
may be more important and reveal more about the series than an 
average value. Besides, averages are always more meaningful and less 
susceptible to misinterpretation when accompanied by statements re- 
garding variability. 

The purpose of this chapter is to examine various methods of 
measuring variability and the interpretation and use of the measures. 
The amount or extent of variability of a statistical series is described 
quantitatively by various measures, the most common of which are 
range, quartile deviation, mean or average deviation, and standard 
deviation. These and their uses will be considered at some length. 
Finally, measures of skewness and kurtosis, which relate to irregularities 
in variation, will be discussed. 


Characteristics of Statistical Series. Variability / 65 


The computation of some of the measures used in describing and 
interpreting variability is somewhat laborious, but the meaning of the 
measures is not difficult to grasp. They are merely quantities by which 
we analyze and describe the amount and peculiarities of the variation 
characterizing statistical data. 


The Range 


The simplest way of describing the variability of the values in a 
series is to state the difference between the highest and lowest values. 
Such a difference is known as the range. Thus, if the highest score ina 
series is 85 and the lowest 45, the range is 40. 

When the data are grouped, as are the IQ's in Table 4.1, the in- 
dividual items lose their exact values, and there is no way of determining 
the actual range. For grouped data, either the difference between the 
midpoints of the highest and lowest class intervals, or the difference 
between the higher expressed limit in the top class and the lower ex- 
pressed limit in the bottom class, may be taken as the range. Ordinarily 
the range is used with reference to ungrouped data. 

When we examine the distributions of Table 4.1, we note that the 
ranges, although roughly indicative of dispersion or variability, fail to 
give any information about the variation of the IQ's between the ex- 
tremes, The range of the great majority of IQ's in School G is from 
about 70 to 114; in School J, from about 80 to 104. 

In general, the range is not a representative measure of the varia- 
bility of a series. Since it is based upon the values of the two extreme 
items, it tells nothing about the variation of the intermediate items. 

The range is easily determined and easily understood. In samples 
from normal populations, it provides a useful estimate of population 
variability. In descriptive statistics, the range is chiefly used as a 
Supplementary measure. In most situations a statement regarding the 
range, in addition to a more trustworthy measure of variability, adds to 


the description of the data. 


Interpercentile Measures 


is the difference between the highest and 
y, it is the distance 


Its chief limita- 


As we have seen, the range 
lowest scores in a series. Described in another wa 


On the scale which includes 100 per cent of the scores. 
tion is its dependence on the two extremes. 


66 / Statistics in Education and Psychology 


There are several measures of variability that are independent of 
the extremes. Before taking up the two most common of these, based 
on the range of the middle 50 per cent and the middle 80 per cent of the 
Scores, we need to consider the meaning and computation of percentiles. 

A percentile is defined as the score or point on the scale of scores 
below which a specified percentage of the scores lie. Some commonly 
used percentiles are the points below which 25 per cent, 50 per cent, 
and 75 per cent of the scores lie. These are known as the 25th, 50th 
and 75th percentiles, respectively, and are designated P55, P59, and 
P75. Pos is often called the first quartile and designated Q4; P5, the 
second quartile and designated Q; and Pz; the third quartile and desig- 
nated Qs. It will be seen that P50, Qo, and the median are different 
ways of expressing the same thing, namely, the point below which half 
of the scores lie. Other commonly used percentiles are Pio and Poo, 
known as the first and ninth deciles; and P20, P40, Poo, and Pgo, known 
as the first, second, third, and fourth quintiles. Thus, the quartiles 
divide a distribution into four equal parts; the quintiles into five equal 
parts; the deciles into ten equal parts; and the percentiles into one 
hundred equal parts. Percentiles are sometimes referred to as centiles. 

The Pth percentile, Pp, of any distribution may readily be found 
by a procedure similar to that used in finding the median. The formula is, 


Pp = L + E) (4.1) 
Z 

in which Pp is the percentile desired—e.g., if the tenth percentile is 

desired, Pp is P10; Lis the lower real limit of the class interval containing 

Pp; PN is the number of cases to be counted off to reach Pp;Fisthe 

total number of cases below the class containing Pp; fis the number of 


cases in the class containing Pp; and i is the class interval. The applica- 
tion of the formula in finding P10, Pos 


25, P75, and Poo is illustrated in 
Table 4.2. 

To find percentiles in a series of un 
arrange the scores in order of size. When this is done, the position of 
e Pth percentile is given by 
P(N + 1)/100, counting from low to high. Thus in a sample of 11, 
the 25th percentile or P»; is equal to the score in the 25(11 4- 1)/100 
or the third position. When the position given by P(N + 1)/100 is 
not a whole number, as is frequently the case, we must interpolate be- 
tween the scores on either side of the position. To do this, we multiply 
the difference between the two Score by the decimal part of 


Characteristics of Statistical Series. Variability / 67 


TABLE 4.2 


Computation of Percentiles in a Distribution of 
293 Reading Scores 


SCORE £ CUM f COMPUTATION OF PERCENTILES BY FORMULA (4.1) 
E 1 293 Required to find Pio, Pos, P75, Poo 

4-56 0 292 

51-53 0 292 10%, of 293 = 29.30 

48-50 17 292 25% of 293 = 73.25 

45-47 26 275 à 5 

42-44 25 249 75% of 293 = 219.75 

39-41 33 224 90% of 203 — 263.70 

36-38 33 191 29.30 20 

33-35 44 158 E (v =) E 
30-32 35 114 Pigg = at 16 siio. 
27-29 29 79 25 — 50 

24-26 14 50 Pos = 26.5 + A) 3 = 2891 
21-23 16 36 

18-20 l 20 219.75 — 191 

15-17 3 9 Pis = 38.5 + — )s = 41.11 
12-14 4 6 

9-11 1 2 Z — = 46.20 
6:8 i i Poo = 44.5 + 26 46 


P(N 4- 1)/100 and add the product to the lower score. For example, to 
find P;; in the 12 scores, 


2.589 H 12 15 17 18 23 25 28, 


we find that 75(12 + 1)/100 is equal to 9.75; take .75 of the difference 
between 18 and 23, the scores in the 9th and 10th positions; and add 


the product to 18. We get P75 = 21.75. f f 
This method of finding percentiles in ungrouped data is particularly 


Convenient when the data are punched in cards and may be rapidly 
arranged in order of size. 


THE QUARTILE DEVIATION 

In any distribution or series, the difference between Pis and Pas 
is called the interquartile range. When the interquartile range is divided 
by 2, the quotient is known as the semi-interquartile range, or the 
quartile deviation, and is usually abbreviated to Q. Thus, 


Q- Ba Dal 2 (4.2) 


68 / Statistics in Education and Psychology 


Since P;5 and P35 are the same as Qs and O, the quartile deviation is 
often expressed as (Qa — Q1)/2. For the data of Table 4.2, where 
P;5 or Qs is 41.11 and P25 or Q, is 28.91, the quartile deviation is 


41.11 = 28.91 _ 6.10. 


Q- 
This tells us that half of the range of the middle 50 per cent of the 
Scores is 6.10. 

Since it is independent of the extreme scores in a series, the quartile 
deviation is a more representative measure of variability than the overall 
range. 

Tf the distribution is symmetrical, O, and Qs are equidistant from 
the median. Consequently, if we lay off a Q distance in both directions 
from the median in a symmetrical distribution, we will include 50 per 
cent of the scores. When a distribution is skewed, as is usually the case, 
== Q from the median include only approximately 50 per cent of the 
scores; however, the approximation tends to be good unless the dis- 
tribution is severely skewed. 

In summary, the quartile deviation is easy to compute and easy to 
interpret. It is applicable to most frequency distributions, including 
those having unequal class intervals and those having bottom or top 
intervals of unspecified length. It pairs naturally with the median, and 
in situations where the median is preferred as the measure of central 
tendency, Q is preferred as the measure of variability. 

In describing a series, it is a good plan to report Qs and Q, as well 
as the median and Q. Given the four summary figures, the reader can 
picture a great deal about the series. 


THE DECILE DEVIATION 


The decile deviation, D, is defined as the distance between the 10th 
and 90th percentiles; i.e., D = Pgo — Pie. For the distribution of 
Table 4.2, where Pgo is 46.20 and P,p is 22.24, D = 23.96. This tells 
us that the range of the middle 80 per cent of the reading scores is 23.96. 

As a measure of variability, D is somewhat better than Q, but is 
not as widely used. Other interpercentile measures that are occa- 
sionally used include Ps7-P33 and Ps3-P;, the former being the range 
of the approximate middle third of the scores, the latter being the range 
of the middle 86 per cent. 

Before leaving interpercentile ranges, we should note that per- 
centiles may be obtained quickly and accurately enough for practical 


Characteristics of Statistical Series. Variability / 69 


purposes from the cumulative percentage curve, discussed in the next 
chapter. 


The Average Deviation 


The quartile deviation and other interpercentile range measures 
which have been discussed do not take into account the variation of the 
individual items in a series. When it is desired to consider all of the 
fluctuations in value that characterize a series, such measures cannot 
be used. 

The simplest method of taking into account the variation of all 
the items in a series is that of finding their average deviation from a 
selected value, usually a point of central tendency. Either the mode, 
median, or the mean may be selected as the point of central tendency; 
since the mean ordinarily is used, however, we shall limit our discussion 
to the average deviation from the mean. 

The time scores in seconds of 12 subjects on a Wechsler-Bellevue 
Picture Arrangement task are listed below, and the deviations of each 


DEVIATION DEVIATION DEVIATION 

SCORE FROM MEAN SCORE FROM MEAN SCORE FROM MEAN 
21 7.1 10 3.9 15 11 
18 4.1 20 6.1 8 -5.9 
6 —7.9 8 —5.9 10 —3.9 
12 —19 13 =9 26 12.1 


from the mean value of 13.9 is shown. The algebraic sum of the devia- 
tions is, of course, zero (within rounding tolerance), but if the signs are 
disregarded their sum is 60.8 with a mean of 60.8/12 or 5.1. The quo- 
tient 5.1 is the mean of the absolute deviations. In other words, on the 
average the scores deviate from their mean by 5.1 seconds. 

When X represents the scores in a series, x is conventionally used 
to represent the deviations of the scores from their mean, Le., 
X — M = x, the mean always being subtracted from score. Using 
this notation, the average deviation from the mean is defined by 


n M| _ ae (4.3) 


AD = 


s absolute values and tells us that the al- 
he summation. The symbol AD 
erage deviation from the mean. 
lected from which to take devia- 


in which the symbol || denote 
Bebraic signs are to be disregarded in t 
IS generally understood to indicate av 
If the median or some other point is se 


70 / Statistics in Education and Psychology 


tions, the fact must be reported. The average deviation is also referred 
to as the mean deviation. 


AVERAGE DEVIATION IN GROUPED DATA 


When data are grouped and the items are assumed to have the 
mid-values of their respective class intervals, the average deviation, as 
defined in formula (4.3) becomes 

_ 2, - 

AD — N (4.4) 

in which fis the frequency in a class and x’ is the deviation of the class 
midpoint from the mean. 

To find the average deviation of a grouped series we need only to 


a. Find the deviations of the midpoints of the class intervals from the 
mean. 

b. Multiply each deviation by the corresponding class frequency. 

c. Add the products, disregarding signs. 

d. Divide the sum by N. 


USES AND LIMITATIONS OF THE AVERAGE DEVIATION 


The average deviation is the simplest measure of variability avail- 
able that takes into account the fluctuations of all the items in a series. 
It is the most meaningful measure to the person untrained in statistics. 
The concept of the average of all deviations from the mean of a series 
is entirely intelligible as a measure of variability. 

The average deviations of the IQ distributions in Table 4.1 are: 


School B 9.04 
School C 11.40 
School G 12.49 
School J 7.50 


The evidence that the IQ's in the G series deviate on the average more 
from their mean than the IQ's in the B, C, and J series is simple and 
clear. We know that the G series is characterized by more variability, 
in the sense of average deviation from the mean IQ, than the others. 

The average deviation has two noteworthy limitations. Since it is 
based upon all of the deviations, it may be inflated by a single extreme 
value. If one of the IQ's in the 90-95 class of the B series had been, say, 
145, the average deviation would have been about 12 instead of 9.04. 
When a series is long, however, and not highly irregular at the extremes, 
this fact is of little moment. Moreover, extreme values inflate the 


average deviation somewhat less than other measures which take into 
account all deviations. 


Characteristics of Statistical Series. Variability / 71 


The second limitation is of much greater consequence and accounts 
for the rather infrequent use of the average deviation. As has been 
seen, the signs of the deviations from the mean must be ignored in 
finding the average deviation. While the disregard of signs is entirely 
sensible, since negative deviations have the same influence upon amount 
of variation as positive, it results in a nonalgebraic quantity. Conse- 
quently, the average deviation is unwieldy in mathematical operations 
and has very limited use in statistical theory. 


The Standard Deviation 


Of the several measures of variability, the standard deviation is by 
far the most used and important. It and its close relatives, the “sum of 
Squares" and the "variance," occupy a central position in statistical 
theory. 

There is no satisfactory way of describing the standard deviation 
other than by stating the operations by which it is calculated. Like the 
average deviation, it is based upon the deviations of all values in a series. 
In its calculation, however, the signs of the deviations are not disre- 
garded. Instead, the negative signs are eliminated by squaring each 
deviation. After the deviations are squared, the squares are summed, 
divided by N, and the square root of the quotient is extracted, the final 
operation translating the quantity back to the linear unit of measure- 
ment. 

. . The operations in 
include 


finding the standard deviation of a series thus 


Finding the deviation of each value from the mean. 

- Squaring the deviations. 

Summing the squares. 

. Dividing the sum by N. R 

Extracting the square root of the quotient. 

To illustrate the procedure, let us find the standard deviation of a 


Simple series. 


Sang» 


DEVIATION FROM MEAN 


SCORE DEVIATION FROM MEAN RUAD 
x X—MORX x 
22 - i 
20 = 4 
2 : 49 
ao I 25 


18 25 


72 / Statistics in Education and Psychology 


The standard deviation of the series, in which N — 5, is the square root 
of 88/5. Thus, S.D. = «/17.6 or about 4.2. 

When we represent the deviation of an item from the mean of its 

series by x, we may define the standard deviation: 
zx? 

S.D. = AA (4.5) 
If we square both members of (4.5) we have (S.D.)? = Ex?/N. The 
square of the standard deviation is designated the variance, and the 
quantity Ex? is designated the sum of squares. These are technical 
terms which, in statistics, are always defined as above. In other words, 
the sum of squares of a series is the sum of the deviations (from the 
mean) squared; the variance is the sum of squares divided by N; and 
the standard deviation is the square root of the variance. In the example 
above, the sum of squares is 88 and the variance is 17.6. 

We digress here for a moment to consider somewhat confusing 
conventions respecting notation. Some writers use the symbol s to 
denote the standard deviation of a given series; some use the Greek 
symbol c (sigma). In modern statistical theory, it is the preferred 
practice to use ø to denote the unknown standard deviation in the 
population and s to denote an estimate of ø, based on sample evidence. 
Now it can be shown that the best estimate of results from multiplying 
the standard deviation of the sample by \/N/(N — 1). It will be seen 
that the same result may be obtained by dividing the numerator of 
(4.5) by N — 1 instead of N. 

Thus, S. D., s, and do not mean the same thing in statistical theory. 
It would be preferable to use S. D. and only S.D. to denote the standard 
deviation of a given series. However, the symbol S.D. is clumsy, 
particularly when a subscript or superscript—e.g., S. D., or S.D.?—is 
needed. Consequently, we shall use s to denote the standard deviation 
in the sample. In later chapters it will be necessary to distinguish be- 
tween s as a descriptive measure of variability and s as an estimate 
of the standard deviation in the population. In the meantime, s will 
always refer to the sample standard deviation. 


COMPUTATION OF THE STANDARD DEVIATION OF UNGROUPED DATA 


The deviation-score method of finding the standard deviation, 
illustrated above, tends to be laborious when the mean is not a whole 
number, and an alternative method using only raw scores usually is 


Characteristics of Statistical Series. Variability / 73 
preferable. The formula for s computed from raw scores is 
s= a NEX? — EXP, (4.6) 


in which XY? is the sum of the squares of the raw scores and (2X)? is 


the square of the sum of the raw scores. 
The scores of the preceding illustration and their squares are shown 


below. Since N = 5, XX = 115, and 


SCORE X x? 

22 484 

20 400 

25 625 

30 900 

18 324 
ZX = 115 DX? = 2733 


EX? = 2733, we have by substitution in (4.6) 


as 1 50733) = (15 
= iva 
= 43 


Thus, to find the standard deviation of ungrouped scores, we need only 
to sum the scores, sum the squares of the scores, and substitute in 
formula (4.6). This method is particularly convenient when a calculat- 
ing machine is available. The sum of the scores and the sum of their 
Squares can be obtained simultaneously on such a machine. 

In using the raw-score method to find s, it is often possible to save 
labor by subtracting a constant from each score. In the above example, 


if 20 were subtracted from each score, the new scores would be some- 
eft as an exercise for the student to show 


what more manageable. It is 1 I h 
that the standard deviation is not affected by decreasing each score in a 


series by a constant amount. 


COMPUTATION OF THE STANDARD DEVIATION OF GROUPED DATA 


it possible to find the standard 
all or large, in minimum time, 
distribution in inspecting and 
d in a frequency distribution 


Although calculating machines make 
deviation of a series of scores, whether sm: 
it may be advisable to make a frequency 
Presenting data. When scores are groupe 


74 / Statistics in Education and Psychology 


and assumed to have the mid-values of their respective class intervals, 
the most straightforward method of finding the standard deviation is by 
a. Squaring the deviations (from the mean) of the midpoints of the class 
intervals. . 
b. Multiplying each squared deviation by the corresponding class fre- 
quency f. 
c. Summing the products. 
d. Dividing the sum of the products by N. 
e. Extracting the square root of the quotient. 


The operations may be summarized in the formula, analogous to (4.5), 


s= md (4.7) 


in which f is the frequency in a class and x’ is the deviation of the class 
midpoint from the mean. The computation is illustrated in Table 4.3. 

Although the method illustrated in Table 4.3 may always be used 
in finding the standard deviation of grouped data, and is the one to use 


TABLE 4.3 


Computation of Standard Deviation of Grouped Data 
(Data from Table 4.1, School C, Mean IQ, 93.32) 


IQ CLASS DEVIATION OF CLASS 
CLASS f MIDPOINT MIDPOINT FROM MEAN X” P Je? 
130-134 1 132 38.68 1,496.1424 1,496.1424 
125-129 0 127 33.68 1,134,3424 
120-124 1 122 28.68 822.5424 822.5424 
115-119 2 117 23.68 560.7424 1,121.4848 
110-114 2 112 18.68 348.9424 697.8848 
105-109 2 107 13.68 187.1424 374.2848 
100-104 3 102 8.68 75.3424 226.0272 
95-99 3 97 3.68 13.5424 40.6272 
90-94 4 92 = 1.32 1.7424 6.9696 
85-89 11 87 — 6:32 39.9424 439.3664 
80-84 3 82 —11.32 128.1424 384.4272 
75-19 4 n —16.32 266.3424 1,065.3696 
70-74 2 72 —21.32 454.5424 909.0848 
SUM 38 7,584.2112 
s= 5842112 
38 
= V 199.5845 


= 14.13 


Characteristics of Statistical Series. Variability / 75 


if class intervals are not of uniform size, there is an easier method ap- 
propriate for most distributions. When class intervals are equal, as is 
usually the case, the deviations of the midpoints from an arbitrary 
origin may be coded in class interval steps or d units and the squares 
of the deviations times their respective frequencies computed in the d 
unit. The method is most easily understood with reference to a specific 
distribution. 

Consider the distribution and the several columns shown in 
Table 4.4. The fand fd columns are completed as in finding the mean 
by the coded method. The entries in the fd? column are the products 
of the fa’s times the corresponding d's. After the sums at the foot of 
Table 4.4 are obtained, they are substituted in the formula 


5 = iP — QD. (4.8) 


in which i is the class interval, and the other quantities refer to sums 


TABLE 4.4 


Computation of Standard Deviation of Grouped Data 
by Coded Method 
(Data from Table 4.1, School C) 


m f d fa fa 
130-134 1 8 8 64 
125-129 0 7 
120-124 1 6 6 36 
115-119 2 5 10 50 
110-114 2 4 8 32 
105-109 2 3 6 18 
100-104 3 2 6 12 
95-99 3 1 3 3 
90-94 4 0 
85-89 11 -1 -11 1 
80-84 3 2 = Š D 
75-19 4 =$ 2 — 39 
70-74 2 —4 =B 32 

SUM 38 10 306 

B 306 _ Br 
i 38 38 
= 54/1983 


14.13 


Nu 


76 / Statistics in Education and Psychology 


obtained as those in Table 4.4. The application of the formula is illus- 
trated in the space below the table. 

Let us summarize the coded method of calculating the standard 
deviation in the following steps: 


a. After the data are grouped in a distribution having equal class in- 
tervals, an arbitrary origin is selected, and the class midpoints are 
coded in the d unit. Work is usually saved if the origin is selected near 
the middle of the distribution. 

b. An fd column is completed, as in finding the mean by the coded 
method. 

c. An fd? column is completed by multiplying the entries in the /d 
column by the corresponding d entries. 

d. (2/d/N y? is found by dividing the algebraic sum of the fd column by 
N and squaring the quotient. 

e. =fd?/N is found by dividing the sum of the fd? column by N. 

f. The square root of the difference 3f4?/N — (xfd/N)? is taken. 

g. The resulting root is multiplied by i, the value of the class interval. 


If desired, formula (4.8) may be written 
s = x V NER — Gay, (4.9) 


a form particularly appropriate for use with a computing machine. 
Applying formula (4.9) to the data of Table 4.4, 


5 


s = gg V38(306) — (10 
5 
= gg V11528 
= 14.13. 


f The coded method is easy to apply and greatly reduces computa- 
tional labor as well as the risk of mistakes. The standard deviation of a 
long series may be found with little more effort than that of a short 


series. The student should practice the method until he becomes adept 
in its use. 


COMBINING STANDARD DEVIATIONS 


Occasionally it is useful to be able to determine the standard devi- 
ation of a total group of scores, given the means and standard devia- 
tions of two or more subgroups making up the total. Since standard 
deviations are algebraic quantities, they can, of course, be manipulated 


Characteristics of Statistical Series. Variability / 77 


algebraically. Consider the general case in which the numbers, means 
and standard deviations of k subgroups are ' 


Subgroup 1 Ni Mi 51 
Subgroup 2 No M» Sa 
Subgroup k Ni M; Sk 


Let us represent the mean and the standard deviation of the total group 
by M, and Sa, respectively. The mean M, may be determined by an easy 
extension of formula (3.5), p. 55, and the standard deviation by the 
formula 


5, = Jeta ENTES DESEE MGE MD p? 
My + Ny + + N . 
(4.10) 


f the formula, we return to the distributions 


To illustrate the use O 
s of the grouped intelli- 


of Table 4.1. The means and standard deviation 
gence quotients are brought together below. 


STANDARD 
SCHOOL NUMBER MEAN DEVIATION 
B 18 93.94 10.02 
C 38 93.32 14.13 
G 32 91.22 15.96 
J 29 89.41 9.70 


By extension of formula (3.5) 


M, = 18 x 93.94 + 38 X 93.32 + 32 X 91.22 + 29 X 89.41 _ 91.87 
18 + 38 + 32 + 29 ae 


Substituting the given means and standard deviations in formula (4.10), 


We have 


18(10.02? + 93.94°) + 38014.13? + 93.32%) + 32 5.962 
+ 91222) + 29(9.70° + 89.41) _ org22 


18 + 38 + 32 + 29 


Šp = 


SO that s, = 13.29. 
The student can verify that when 
are combined, the mean and standari 


the four distributions in Table 4.1 
d deviation of the combined dis- 


78 / Statistics in Education and Psychology 


tributions, determined directly, are 91.87 and 13.28, the discrepancy 
between the two s values being due to rounding. 

The methods of finding the mean and standard deviation of a total 
group from those of subgroups are useful chiefly in two situations: 
(1) when it is desired to determine the total mean and standard devia- 
tion when all that remains of the original data are the numbers, means, 
and standard deviations of subgroups, and (2) when it is desired to de- 
termine the effect of new groups upon old means and standard devia- 
tions without making up new distributions; for example, when it is 
desired to revise test norms. 

It should be noted again that one of the great advantages the mean 
and standard deviation have over other measures of central tendency 
and variability lies in their algebraic nature. 


USES AND LIMITATIONS OF THE STANDARD DEVIATION 


The standard deviations of the distributions of Table 4.1, listed 
above, indicate that the IQ's of the G distribution are the most variable 
and those of the J distribution the least variable about the respective 
means. As a descriptive measure of variability, the standard deviation 
is always interpreted in this way, i.e., the greater its value, the more the 
Scores scatter on an average from their mean. 

If we lay off distances equal to lo* above and below the mean of a 
normal distribution, the interval includes 68.3 per cent of the scores or 
items; the interval 2-2c includes 95.4 per cent; and the interval +30 
includes 99.7 per cent. These percentages, as well as those included by 
+.50, +1.50, and +2.5ø, are shown in Figure 4.1. The fact that z 
intervals include fixed percentages of items in any normal distribution 
is of fundamental importance in statistical theory. The proportions of 
area included by successive .0lo distances from the mean are shown in 
Table C, Appendix. Since the area under the polygon of the normal 
distribution corresponds to the total number of items, N, of the distri- 
bution, the proportions of Table C make it possible to determine the 
number of items falling within any c-unit interval. 

One of the most interesting and useful applications of the standard 
deviation in the normal distribution relates to determining the chances 
that a random score or statistic will not deviate from its expected value 
by more than some specified amount. To illustrate, if we select a score 


* ø instead of s is used here for the standard deviation, since we are thinking 
about a theoretically normal distribution rather than a sample distribution. 


Characteristics of Statistical Series. Variability / 79 


l 
I 
I 
! 
I 
I 
| 
I 
| 
I 
| 
J 
| 
I 
I 


^ l 
1.376 —1 >” 


> 


x 


| 
I 
| 
1 
' 
I l 
| 
I 
1 
I 
! 
T 


I 
I 
l 
1 
I 
| 
I 
| 
T 


1 
l 
1 
l 
l 
l 
| 
i I T ! 
i AL 

-30c 2.50 2.00 -1.50 1.00 0.50 M=0 0.50 100 150 200 250 3.00 


Fig. 4.1. Percentages of scores included within +.50, > o T AN 
+2.00, +2.50, and 3.00 of the arithmetic mean In a normal distri à 


at random from a normal distribution, the chances are about 95 in 100 
that it will not deviate from the mean by more than +20. bia i 
because only about 5 per cent of the scores 1n à metal distri ies 
Outside the range +20 from the mean. In later chapters we 


this fact in estimation and in testing hypotheses. . 
Standard deviation intervals include invariant percentages of items 


only in the normal distribution. However, unless Tei . 
are characterized by little or no “piling up €: i 3 of them to fall 
treme values, there is a marked tendency for abou / a al Mi aa e 
in the interval M == 1s, about 95 per cent to filio Hare "TES 


i i i See Ex. 15.) 
and for all to fall in the interval M + 3s, C J 
di aed deviation, as an algebraic quantity, has many uses 


denied other measures of variability. As our study of sistin ie 
Proceeds, we shall find that it deserves its place as = Fe as e 
of variability. It not only is the most trustworthy o ^ ieu e 
Ures, as a rule, but is indispensable in correlational e wins x 
item analysis, and in judging the reliability of statistica! p 


inferences, These uses will be taken up later. 


80 / Statistics in Education and Psychology 


The standard deviation has an interesting property, one which is 
highly important in statistical theory. In any series, the sum of squares 
of deviations of the items from their arithmetic mean is less than the 
sum of squares of the items from any other value. As a consequence, 
the standard deviation is less than any similar “root-mean-square.” 

The standard deviation has two distinct limitations. It is extremely 
difficult to interpret to those untrained in statistics, and as a rule should 
not be used in describing statistical data to the untrained. Its second 
limitation arises from its sensitivity to extreme values. By studying 
several simple series, the student can easily satisfy himself that the 
standard deviation is more affected by extreme values than other 
measures, with the exception of the overall range. Hence it is of doubt- 
ful propriety for series containing a few extreme values relative to the 
majority. 

As will be seen in a later section, the standard deviation is a mem- 
ber of the “moments” system. Since the mean is also a member of the 
system, the mean and standard deviation are ordinarily used together. 
When the former is an appropriate measure of central tendency, the 
latter is an appropriate measure of variability. 


Interpretation and Use of Measures of Variability 


It has been said that the most popular use of an average is to con- 
ceal variability. Although the remark is facetious, an average does 
conceal variability, and hence, unless supplemented by other informa- 
tion, presents an indefinite and perhaps distorted picture. As has been 
emphasized in the preceding pages, the extent and manner of the scat- 
tering of items about their average value should always be taken into 
account in describing and analyzing a series. 

In this section we shall summarize the more important properties 
and uses of measures of variability in analyzing statistical data. 


PROPERTIES OF MEASURES OF VARIABILITY 


Measures of variability are merely statistics that summarize the 
amount or extent of variation. Each measure differs from the others 
because each summarizes variation in a different way. In contrast with 
the point nature of averages, measures of variability may best be in- 
terpreted as distances on the scale of scores. The interpercentile meas- 
ures, as derived, are such distances; the average and standard deviations 
may be thought of as such. In a normal distribution, the mean, median, 


Characteristics of Statistical Series. Variability / 81 


and mode coincide, i.e., they have the same value. Although this is not 
true of the measures of variability, the latter do have a constant relation- 
ship. In the normal distribution, it can be shown that the following 

g 


relationships obtain: 


c = 12534D c = 1483Q 
AD = .708c AD = 1.1830 
Q = 6750 Q = .8454D 


It also is true in normal distributions that c, AD, and Q distances 
po the mean include the following percentages of items (cf. Figure 
«d: 
lo below mean to la above mean includes about 68 per cent. 


20 below mean to 2e above mean includes about 95 per cent. 
30 below mean to 3c above mean includes about 99.7 per cent. 


14D below mean to 14D above mean includes about 58 per cent. 
24D below mean to 24D above mean includes about 89 per cent. 
3AD below mean to 34D above mean includes about 98 per cent. 


1Q below mean to 1 Q above mean includes 50 per cent. 

20 below mean to 20 above mean includes about 82 per cent. 

30 below mean to 30 above mean includes about 96 per cent. 

40 below mean to 40 above mean includes about 99.3 per cent. 

ly only for normal distributions, 


Although these facts hold exact 
hat, for the great majority of dis- 


it is a matter of common experience t 
tributions encountered in research work, they hold sufficiently to be 


generally useful in interpretation and analysis. If we have, for example, 
s large, approximately normal distribution of measures of some ability, 
with a mean of, say, 70 and a standard deviation of 10, we know that 
the measures range from about 40 to 100, that about 95 per cent of them 
fall in the interval 50-90, and about 2/3 in the interval 60-80. If we 
have a second distribution of measures of the same ability with a mean 
of 70 and a standard deviation of, say, 5, we know that the first distribu- 
tion includes a great many more able and less able individuals. (If 
desired, Q or AD instead of s intervals can be employed in similar 
analysis.) 

The measures of skewness and kurtosis which we shall consider in 
the next section provide quantitative description of nonnormal variation 
and are helpful in determining whether a distribution departs too 
Breatly from normality to be analyzed by standard methods. 


82 / Statistics in Education and Psychology 


APPROPRIATE APPLICATIONS 


In deciding which measure of variability to use in a given situation, 
several considerations need to be taken into account, although it may 
not be possible to meet all of them. 

In reporting to people untrained in statistics, the range or average 
deviation will be best understood. If a series is such that the median is 
the appropriate average, O or some other percentile measure ordinarily 
should be used. Likewise, the mean and standard deviation are usually 
used together. In many situations it is advisable to use two measures 
of variability, one for simple descriptive purposes, the other as a basis 
for more exact and extended analysis. 

As in the case of averages, the best aid in selecting a measure of 
variability is familiarity with the advantages and limitations of the 
various measures, recognition of the purposes to be accomplished, and 
precaution against presenting a misleading picture. 

In summary statement: 


The overall range is too much affected by the chance position of the 
highest and lowest values in a series, and tells too little about the varia- 
tion of intermediate items to be useful except for rough purposes. It is 
chiefly useful as a supplementary measure in description. 

The interpercentile range measures, such as Q and D, team with the 
median. They are not affected by extreme values and are usually ap- 
plicable to open-end distributions and to distributions having unequal 
class intervals. Ordinarily, the points Q; and Q 1 or Pgo and P10 should 
be reported as well as Q or D. 

The average deviation is the most direct and easily interpreted repre- 
sentative measure. It is affected by all deviations, but is not as sensitive 
to extreme deviations as the standard deviation. It tends to be almost as 
reliable as the standard deviation, in the sense of being characterized by 
little fluctuation from sample to sample drawn from the same population. 

1 The standard deviation is the most used and useful of the measures. 
It is generally the most reliable, it enters into further statistical analysis 
at various points, and is tractable in mathematical discussion. In the 
latter respect, it is unique among measures of variability. It teams with 


the mean. As a rule it should be used unless there is good reason for not 
using it. 


Skewness and Kurtosis 


In an earlier chapter, pp. 37-38, the meaning of skewness and 
kurtosis was touched upon. We are now ready to examine the concepts 
more fully. Skewness and kurtosis depend upon the manner in which 


Characteristics of Statistical Series. Variability / 83 


the scores in a series scatter about the average value. When the scatter 
is greater on one side of the point of central tendency than on the other, 
the distribution is skewed. When there is high concentration of the 
scores in the neighborhood of the point of central tendency, the dis- 
tribution is relatively narrow across the shoulders, or leptokurtic; when 
there is low concentration of scores in the neighborhood of the point. 
the distribution is relatively broad across the shoulders, or platykurtic. 

There are several methods of judging whether a distribution lacks 
normal symmetry. When the mean is larger than the median, the dis- 
tribution is positively skewed (see Figure 4.2); when the mean is smaller 
than the median, the distribution is skewed negatively. When Qs = 9° 
is greater than O» — Qj, positive skewness is indicated, when it is 
smaller, negative skewness is indicated. Actually, skewness may or- 
dinarily be detected by inspection of the frequency polygon. It is not 
possible to detect departure from normal peakedness by inspection, 
however, since apparent peakedness may result from choice of dimen- 
sions for the polygon. When quartile or standard deviation intervals 
do not include “normal” percentages of scores, the distribution is 


either skewed or nonnormally peaked or both. 


245 


22 | 
20 


ES 


Frequency 
CES 
— 


d 
6 
M 550.37 
3 Z 
al 
° 6445 704.5 764.5 
5245 — 584.5 Y ; 
i 374. as bas sus. GAS 6745 TIAS 7945 
i Score 
(Data 


Fig. 4.2. Positions of mean and median in the skewed distribution. 
from Table 2.2) 


84 / Statistics in Education and Psychology 


Such methods of describing departures from normality, although 
useful in preliminary analysis, are rough and do not permit satisfactory 
comparisons. In order to describe and compare distributions exactly, 
we must have measures of skewness and kurtosis as well as measures 
of central tendency and variability. 

There are various simple measures of skewness and kurtosis based 
on considerations similar to those mentioned above, but they are neither 
sensitive nor trustworthy indications of whether sample departures from 
normality are sufficient to discredit the assumption that the population 
is normal. Far more useful are the measures based on moments. 


MOMENT MEASURES.OF SKEWNESS AND KURTOSIS 


In mechanics the term moment is used to denote a measure of the 
tendency of a force to cause rotation of an object about a point. Since 
the strength of the tendency depends upon the amount of the force 
and the distance from the point at which the force acts, a moment is the 
product of force times distance. When the sum of the moments tending 
to cause rotation in one direction is equal to the sum of the moments 
tending to cause rotation in the opposite direction, the object is in 
balance. 

Now an item of a statistical series may be thought of as a unit 
force acting at a distance x from the arithmetic mean, i.e., as a moment 
of force. Since the sum of negative deviations from the mean is equal 
to the sum of positive deviations, the mean is analogous to a point of 
balance. In statistics the algebraic sum of the distances or deviations 
from the mean divided by N, Xx/N, is called the first moment of the 
series. 

When the deviations x of the items are squared, summed, and 
divided by N, the quotient is called the second moment of the series. 
The third and fourth moments are based upon the third and fourth 
powers, respectively, of the deviations. These moments about the mean 
may be designated by the letter m with appropriate subscript. In this 
notation, the first four moments of a series are 


mı = Ex/N = 0; mz = Ex?/N = s?; ms = Ex*/N; m, = Ex*/N. 


It will be noted that ms is equal to the variance. 

The moments of a statistical series are important because they 
permit precise and sensitive measures of departure from normal form. 
(In advanced statistics, the higher moments are used to distinguish 


Characteristics of Statistical Series. Variability / 85 


wart -€— 
=> types of frequency distributions.) The measure of skewness, g1,* 
erived from moments, is given by the formula T 


aj ms 
,=—— 
maV Ma AID 


and the measure of kurtosis, g»,* by the formula 
m 
ma. (4.12) 


gs = 5 
ms 


In a normal distribution gı = O and go = 3. A negative value 
i indicates left-hand or negative skewness; a positive value, right- 
and or positive skewness. The greater the departure from zero, the 
i ie skewness. When g2 exceeds 3, the distribution is leptokurtic; 
am n go is less than 3, the distribution 1s platykurtic. The greater the 
erence between g» and 3, the more pronounced the leptokurtosis 

or the platykurtosis, as the case may be. 


CALCULATION OF MOMENTS FOR GROUPED DATA 
s tend to be unreliable in small 


Measures of skewness and kurtosi 
worth computing only when a 


samples, and ordinarily the measures are 
Statistical series is long enough to warrant grouping. 

When data are grouped in class intervals of constant size 7 and 
coded in the d unit, the formulas for the second, third, and fourth 


m 
noments about the mean are 


"T Es ES Pa? (4.13) 
— Es 5 (2) (8) m Gy] P. (4.13) 
m- O E CQ C) 00 Te 

(4.13”) 


although entailing rather laborious 
The values for substitution in the 
obtained, as illustrated in Table 4.5. 
and fd*, the computational layout 
finding the standard deviation 


es application of the formulas, 
i. putations, is not difficult. 

Es are ordinarily quite easily 
in ept for the columns headed fë 
In Table 4.5 is exactly like that used in 


* = 
meas, ihe symbols as and VB: are 
re of skewness, and the symbols « 


gi to identify this 


sometimes used instead of 
for kurtosis. 


¿and B2 instead of g2 


86 / Statistics in Education and Psychology 


TABLE 4.5 


Computation of Moments for Grouped Data 
(Data from Table 2.2, p. 27) 


SCORE # d fd fa? fa? fdt 
780-809 1 8 8 64 512 4,096 
750-779 2 7 14 98 686 4,802 
720-749 3 6 18 108 648 3,888 
690-719 6 3 30 150 750 3,750 
660-689 7 4 28 112 448 1,792 
630-659 12 3 36 108 324 972 
600-629 8 2 16 32 64 128 
570-599 15 l 15 15 15 15 
540-569 14 0 
510-539 16 -1 —16 16 — 16 16 
480-509 24 2 —48 96 —192 384 
450-479 14 e -42 126 —318 1,134 
420-449 7 —4 —28 112 —448 1,792 
390-419 6 —5 —30 150 —750 3,750 
360-389 1 —6 = 6 36 —216 1,296 
330-359 2 -7 —14 98 —686 4,802 
SUM 138 -19 1321 761 32,617 

ZEfd —19 2fd? 1,321 

a Ki. Uy = qa = 957 

Xfd* — 761 Xfdí — 32,617 

M = ag 755 MN == 23636 


Substituting in formulas (4.13) 
[9.57 — (—.14)?]i? = 9.5572 
[5.51 — 3(9.57)(—.14) + 2(—.14)3]i3 = 9,5353 


[236.36 — 4(5.51)(—.14) + 6(9.57)(—.14)? — 3(—.14)!]i* 
240.57i* 


ma 
m3 
ma 


M H H H 


of grouped and coded data. The entries in column fd? are obtained by 
multiplying the entries in the fd? column by their respective d values, 
with attention to sign, and the entries in the fd* column are obtained 
by multiplying by d again. The various sums are divided by N and sub- 
stituted in formulas (4.13), as shown in the space below the table. The 
student is cautioned to exert care in dealing with the signs of the sub- 
stituted values. Most errors in computing moments arise from careless- 
ness with signs. 

There is rarely any need to substitute the value of the class interval į 
and express the moments in the original unit of measurement. When 
the moments are substituted in formulas (4.11) and (4.12), the 7's cancel 


Characteristics of Statistical Series. Variability / 87 


out, as shown below. By formula (4.11) the skewness of the distribution 
of Table 4.5 is 3 
9:537 9.53 


g 2 
: 9.55j4/9.55i? — 9.55 /9.55 
and by formula (4.12) the kurtosis is 


240.57! _ 240.57 
8? = (ossmy ^ 055P — 20^ 


Hence the distribution is positively skewed and platykurtic. 


USES OF MEASURES OF SKEWNESS AND KURTOSIS 


Although a full appreciation of measures of skewness and kurtosis 
depends on knowledge regarding theoretical frequency curves, a special 
and highly important case of which is the normal curve, some idea 
about their use is not difficult to grasp. These measures indicate the 
extent of nonnormal variation in a statistical series. They, like the mean 
and standard deviation, are members of the moments system. The four 
quantities, M, s, g}, and g» are sometimes called the descriptive constants 
of the frequency distribution. The four constants and the number of 
Cases W convey all of the information ordinarily needed to understand 
ànd interpret unimodal distributions. . . 

Although measures of skewness and kurtosis are not serviceable in 
the many ways that measures of central tendency and variability are, they 
are more fundamental than the latter, in the sense that they quantita- 
tively indicate departure from normality. The great majority of tech- 
"ques applied to sampling problems presuppose normality at some 
Point in their application. Moreover, some of the simple descriptive 
Measures of central tendency and variability, as we have seen, tend to 
Ose meaning as a distribution departs from normality. The type of 
distribution is usually of first concern in statistical analysis. 

Measures of skewness and kurtosis permit comparisons of the 
Shape of two or more given distributions. As descriptive measures they 


7 PDleme dency and variability in that 
nt a of central tendency t 
cha r in which the scores 1n à 


SY provide information regarding the manne if a dis- 
Series are scattered about the average value. As noted above, if se 3 
tribution is sensibly unimodal, it may be comprehensively described in 
terms of the four measures, M, s, £1. and 82- in making in- 

Measures of skewness and kurtosis are also a didis 
ences from sample data regarding the form of population distr à 


IS is an application we shall take up in à later chapter. 


88 / Statistics in Education and Psychology 


10. 


EXERCISES 


. Why are measures of variability needed ? 
. The median IQ in School C, Table 4.1, is 89.05. The difference between 


the median and Q3 is 12.95; the difference between the median and Q í 
is 4.32. What does this indicate regarding the shape of the distribution? 


. Compute the quartile deviation, Q, of one or more of the distributions in 


Table 3.1. In what way does Q supplement the median in the description 
of a distribution? What conclusions about a distribution can you draw 
from Q1, Q2 (median), Qs, and Q? 


. Given the information that in a distribution of salaries Pio = $4,000, 


P25 = $4,250, Pso = $4,450, P75 = $5,250, and Poo = $7,250, make 
a rough sketch of the distribution. 


. In what sort of distribution does the interval Mdn + O contain exactly 


the middle 50 per cent of the scores, and the interval Mdn + D the 
middle 80 per cent of the scores? 


. A common method of constructing a scale for measuring attitude toward, 


say, social security is to have a large number of judges rate statements of 
opinion about social security from 1 to 11, according to the favorableness 
of the opinions. The statements are then assigned numerical values equal 
to the respective medians of the judges' ratings. Suppose that among the 
statements which have been rated by 50 judges, the ratings of statements i 
and j were distributed as shown below. If only one of the two statements 
were to be included in the scale, which should it be? Why? 


JUDGE'S RATING OPINION 
i j 

7 3 
6 4 

5 13 11 
4 24 12 
3 6 9 
2 2 8 
1 1 


. Compute the average deviation, AD, of the IQ's in the School J distribu- 


tion, of Table 4.1. Interpret AD. What does the word average refer to in 
average deviation? 


. The sum of squares of a series of 36 items is 1,296. What are the variance 


and standard deviation? 


. Find the average and standard deviations of series A and B below. Which 


is more affected by the extreme score in the B series? 


A: 30 30 32 33 34 35 35 35 36 40 
B: 30 30 32 33 34 35 35 35 36 60 


Why is the standard deviation not an appropriate measure of the vari- 
ability of the distribution of salaries in Exercise 7, Chapter III? 


11. 


12. 


13. 


14, 


15, 


16, 


17, 


18. 


Characteristics of Statistical Series. Variability / 89 
/ 


In what sense is the standard deviation an average of deviations from the 
mean: 

What is the effect on the standard deviation of adding or subtracting a 
constant to or from each score in a series? Of multiplying or dividing 
each score by a constant? (Experiment with several simple series, like 
those of Exercise 9, above, or devise a general proof.) 

Verify that the mean and standard deviation of the distribution formed by 
combining the four distributions of Table 4.1 are 91.87 and 13.28, re- 
spectively. 

The means and standard deviations of mathematical aptitude test (MAT) 
scores and numbers of freshmen in three colleges of a university are shown 
below. Find the mean, Mr, and the standard deviation, sy, of MAT for the 


three groups combined. Interpret these. 


LIBERAL ARTS BUSINESS ENGINEERING 
N 150 200 50 
M 600 560 700 
E 60 80 50 


al school applicants on the Miller Survey of Mechan- 
ean and the standard devia- 
ages of scores in the 


The scores of 67 dent 
ical Insight are shown below. (a) Find the m 
tion of the distribution. (b) Estimate the percent 
intervals M + 1s, M + 2s, M + 3s. 


SCORE f 
32-35 4 
28-31 3 
24-27 12 
20-23 10 
16-19 16 
12-15 11 

8-11 6 

47 3 

0-3 2 


Try to find or construct one or more frequency distributions in which the 
er cent of the scores or more than 


interval M == 15 includes less than 60 p! 

75 per cent. Sketch the histogram or polygon of the distribution(s). 
In a normal distribution of 1,000 scores, how many scores are within 
and how many outside each of the following intervals: M = .5s, 
M + 1.55, M + 2.55? 
Verify that the ranges, Q’s, 4 
are as shown below. With referen 
ties of the measures, account for t 


D's, and s's of the distributions of Table 4.1 
ce to the distributions and the proper- 
he contradictory evidence regarding 


variability. 
DISTRIBUTION RANGE Q AD s 
B 33 9.28 9.04 10.02 
€ 57 8.64 11.40 14.13 
G 71 11.88 12.49 15.96 
J 40 5.70 7.50 9.70 


90 / Statistics in Education and Psychology 


19. 


20. 


Compute and interpret the g; and g» measures of skewness and kurtosis 
of the MAT scores of Exercise 7, Chapter II. 

The distribution of pretest scores of a large group in an experiment was 
unimodal with M = 50.0, s — 5.0, g; — —.04, and g» = 3.10. The 
distribution of posttest scores was unimodal with M = 54, s — 8.0, 
gi = .70, and g» = 2.25. Make rough sketches of the polygons of the 
distributions and discuss the results of the experiment. 


CHAPTER V 


Transformations of Scores 


Transformations are made by changing observed or raw scores in some 
systematic way. Their main purposes are (1) to simplify the data 
and computations, as when scores are reduced by a constant amount 
before the mean and standard deviation are computed, (2) to make 
scores from different instruments comparable, and (3) to change the 
shape of a distribution. 

In this chapter we sha 
purpose, namely, that of t 
scores. Such transformations 
Or combine an individual's s 


Since raw scores are not comparable. 
Although there are various kinds of transformations of raw scores 


to comparable scores, We shall be concerned here with only the two 
that are most widely used in statistical work: percentile ranks and 


Standard scores. 


H be concerned mainly with the second 
ransforming raw scores to comparable 
are necessary when we wish to compare 
cores on different tests or instruments, 


Percentile Ranks 


When the score of an individual in a group is expressed as the 
percentage of the group which the individual exceeds, the percentage 


91 


92 / Statistics in Education and Psychology 


is called a percentile rank. Thus, if an individual has a percentile rank 
of 60, we know that he exceeds 60 per cent of the individuals with whom 
he is being compared. The term percentile rank is a suitable one, for 
it expresses the idea of position or rank on a scale of 100. 


COMPUTATION OF PERCENTILE RANKS 


The percentile rank of a given raw score may be quickly ap- 
proximated from the cumulative percentage curve, discussed below, 
or it may be determined exactly by an arithmetic procedure. Let us 
find the exact percentile rank of a score of 38 in the distribution of 
293 reading scores of Table 5.1. The score falls in the class interval 


TABLE 5.1 
Distribution of 293 Reading Scores 


SCORE / CUM f SCORE F CUM f ` 
57-59 1 293 30-32 35 114 
54-56 0 292 27-29 29 79 
51-53 0 292 24-26 14 50 
48-50 17 292 21-23 16 36 
45-47 26 275 18-20 11 20 
42-44 25 249 15-17 3 9 
39-41 33 224 12-14 4 6 
36-38 33 191 9-11 l 2 
33-35 44 158 6-8 1 1 


35.5-38.5; hence it clearly exceeds the 158 scores falling below that 
class interval. To find how many scores lie below 38 in the 35.5-38.5 
class, we assume that the 33 scores in that class are distributed evenly 
over the interval and interpolate. As shown in Figure 5.1, the point 


I 
I 
I 


I 
33 scores | 


in interval! 


between 35.5 
ond 38 | 


158 scores I 
fall below |——2⁄— 
355 e— 8=—1 
35.5 38 38.5 
Score 


Fig. 5.1. Interpolation in the class interval containing score whose percentile 
rank is to be determined. (From Table 5.1.) 


Transformations of Scores / 93 


38 includes 2.5/3 of the 33 or 27.5 scores in the 35.5-38.5 class. In all, 
then, there are 185.5 scores falling below the score 38, and the per- 
centile rank of 38 is 185.5/293 X 100 or 63.3. This means simply 
that an individual having a score of 38 on the test exceeds 63.3 per 
cent of the group. Percentile ranks ordinarily are reported to the 
nearest whole number. 

The arithmetic work in computing the percentile rank of any 
score in a given distribution may be summarized in the formula 


PR(X) = fea gm. (5.1) 
where X is the score whose percentile rank is desired; N is, as always, 
the number of scores in the distribution; F is the cumulative frequency 
up to the class containing X; L is the lower real limit of the class con- 
taining Y; f is the frequency in the class containing X; and ; is, as 
always, the class interval. 

Let us illustrate the formula by finding again the percentile rank 

of a score of 38 in the distribution of Table 5.1. Since N — 293, 

F = 158, X = 38, L = 35.5, f = 33, and i = 3, we have by sub- 
Stitution, 

100 


PR(38) = 593 [is 4 


(38 — — 
3 J 


100 27.5) = 63.3 
poy (158 + 27.5) š 


which would be reported 63. . . 
The student should note the converse relationship between per- 


centile and percentile rank. The former is a score below which a speci- 
fied percentage falls; the latter is the percentage below a specified score. 


PERCENTILE RANK IN ORDERED DATA 

ns in which individuals or objects 
d" on some trait. For example, 
“N” lowest in respect 


There are a great many situatio 
are “ordered” rather than “measure: 
N students may be ranked from "1" highest to 
to initiative. Occasionally, test results are better thought of as ordered 
data, rather than as definite points on a scale of scores. High school 
and college class ranks obviously are a special case of ordered data. 

It is sometimes useful to transform such ranks to percentile scores. 
When this is done, the rank order of an individual with respect to 


94 / Statistics in Education and Psychology 


some trait can be directly compared with his performance on tests 
which have percentile norms. Also comparisons between ranked 
individuals can be made, regardless of differing numbers in groups. 
For example, a college admissions office may change high school 
class rank to percentile class rank in order to eliminate effect of class 
size. (See Table A, Appendix.) 

Suppose we have a group of 12 individuals who have been ranked 
from 1 to 12 in respect to some trait. A histogram of the data, since 
l is highest and 12 is lowest, would be constructed as shown in Figure 
5.2, in which the midpoints of the class intervals are the respective 
ranks and the frequency in each class is 1. To find the percentile rank 
of, say, the individual who is in position 5, we need merely to divide 
7.5, the cumulative frequency below 5, by 12 and express the quotient 
as a percentage. Thus, the percentile rank of the individual who is in 
the fifth position in a group of 12 is 7.5/12 X 100 or 62.5, which tells 
us that the individual exceeds 62.5 per cent of the group. 

It is left as an exercise for the student to show that 


PR(R) — 100 (=s), (5.2) 


in which R is the rank whose percentile value is desired and N the 
number of individuals ranked, it being agreed that “1” indicates the 
highest or best and “N” the lowest or poorest positions. 

When there are ties for position, formula (5.2) is applicable, pro- 
vided the average of the serial ranks tied for is assigned to each of 
the ties. To illustrate: 


Individual A is first and he is assigned the rank of 1. 
B|tie for second and third and each is assigned the 
CJ rank of 2.5. 
D is fourth and he is assigned the rank of 4. 


El .. š ; 3 
tie for fifth, sixth, and seventh, and each is assigned 
G the rank of 6. 


H is eighth and he is assigned the rank of 8. 
(And so on until all individuals have been ranked.) 


USES AND LIMITATIONS OF PERCENTILE RANKS 

The most frequent application of percentiles is in testing. Stand- 
ardized tests, particularly those used above elementary school grades, 
usually report norms in terms of the percentile values of raw scores. 


Transformations of Scores / 95 


Frequency 
m 
N 


0 


Ronk order 


Fig. 5.2. Histogram of ranked data. 


If an individual obtains a score on a standardized test which has a 
percentile rank of 35, we know that the individual exceeds 35 per cent 
of the group used in standardizing the test. This is straightforward 
and valuable information, provided the individual can fairly be com- 
pared with those in the standardizing group. 

When obtained scores or rank orders are changed to percentile 
ranks, an individual's performance on tests or his position in ordered 
Series can be brought into comparison, regardless of the dissimilar 
nature of the original or raw scores. Individual profiles, needed in 
counseling and placement, frequently use percentile ranks as the 
common unit. 

The percentile method of transforming raw scores to comparable 
Scores has two notable limitations. In the first place, percentile ranks 
are not subject to algebraic treatment, and hence cannot logically be 
used when two or more scores are to be combined into a composite 
Score. This limitation, however, does not tend to be of any great 
importance in practical work. A composite score in most situations 
eliminates the very details which need to be taken into account. An 
individual's strength in one prerequisite ability can rarely be con- 
Sidered to offset his weakness in a second. 

The second limitation is of greater importance. Percentile ranks 
are not in proportional relationship to the raw scores. Consider 
again the distribution of Table 5.1. We have already found the per- 
centile rank of a score of 38 to be 63. By similar methods we would 
find the percentile ranks of 35, 18, and 15 to be, respectively, 51, 4, 
and 2. The difference between the percentile ranks corresponding to 
38 and 35 is 12; the difference between the percentile ranks corres- 
Ponding to 18 and 15 is 2. Thus a difference of 3 between raw scores is 
represented by 12 at one place on the percentile scale, but by 2 at 
another place. Similar disproportionality exists at the upper end of 
the percentile scale. However, percentile ranks of the middle 80 per 


96 / Statistics in Education and Psychology 


cent of the scores tend to be proportional in distributions similar in 
shape to that of Table 5.1. 

No general statement can be made about the extent of nonpro- 
portionality in the percentile scale and the consequences of obscuring 
through its use relatively great differences near the top and bottom of a 
distribution. These depend upon the shape of the distribution and the 
uses to which the percentile scores are to be put. Obviously, if the 
scale of raw scores is accurate and trustworthy throughout and the 
distribution is more or less bell-shaped, the use of percentile measures 
introduces error. However, raw score scales are frequently inaccurate, 
particularly at the extremes. When this is the case, the percentile 
system, which makes only the modest assumption that the scores are 
ordered, is appropriate. 


THE CUMULATIVE PERCENTAGE CURVE 


As noted above, percentile ranks may be obtained quickly from 
the cumulative percentage curve, a graph of wide usefulness in statis- 
tical work. In constructing the curve, the cumulative frequencies of a 
distribution are changed to cumulative percentages by dividing suc- 
cessively by N and multiplying by 100. For example, in the distribution 
of Table 5.1, the lowest cumulative frequency gives a cumulative per- 
centage of 100(1)/293, or .3; the next lowest, 100(2)/293 or .7; the 
third lowest, 100(6)/293 or 2.0; and so on. 

After the cumulative percentages are obtained, they are plotted 
as illustrated in Figure 5.3. The cumulative frequency scale is shown 


100 293 
90. 270 
80 I 

2 70} 4210 S 

o 

a 60 180 ES 

z] = 

t 50r 1150 ¢ 

Š 40} 120 5 

5 3 

23 90 Ë 

> 
20} 160 Y 
10} 130 
0 Le ~ : 0 
55 115 175 23.5 295 355 41:5 47.5 53.5 59.5 


8.5 145 205 265 325 38.5 445 505 565 
Reading score 


Fig. 5.3. Cumulative percentage curve of a distribution of 293 scores. (See 
Table 5.1.) 


Transformations of Scores / 97 


Percentile scale 
W 
3 


ba L L AS 
2.5 8.5 145 20.5 26.5 32.5 38.5 445 50.5 
5.5 11.5 17.5 23.5 295 355 41.5 475 53.5 


Score 


Fig. 5.4. Cumulative percentage curves of four distributions. (From Table 3.1.) 


at the right as an aid to understanding the figure. It ordinarily is not 
included, since it adds nothing useful to the graph. The construction 
is similar to the construction of the cumulative frequency curve of 
Chapter II, and is further illustrated in Figure 5.4. It will be noticed 
that since the cumulative percentages are obtained by multiplying the 
cumulative frequencies by the constant 100/N, the two curves have 
Similar shapes. 

The cumulative percentage curve provides a convenient method of 
finding percentile ranks. For example, in Figure 5.3, if we want to 
know the percentile rank of a raw score of 38, we go along the hori- 
Zontal scale to 38, vertically upward to the curve, across to the per- 
centile scale, and read 63. Thus, a score of 38 in this distribution has 
a percentile rank of about 63, i.e., it exceeds about 63 per cent of the 
293 reading scores. 

When a cumulative percentage curve is plotted on cross-section 
Paper, the percentile ranks of the raw scores in a distribution can be 
Quickly found. Although the graphical method is not as accurate as 
the arithmetic method summarized in formula (5.1), it is ordinarily 
accurate enough for practical work. 


OTHER USES OF THE CUMULATIVE PERCENTAGE CURVE 


In addition to permitting rapid calculation of percentile ranks, 
the cumulative percentage curve is useful in finding percentiles and in 
Comparing distributions. We digress briefly to consider these uses. 


98 / Statistics in Education and Psychology 


The curve comprehensively summarizes much of the information 
available in a frequency distribution. When it forms a smooth ogive, 
the distribution approximates the symmetrical, bell-shaped form. 
When it is comparatively narrow and steep, the distribution has com- 
paratively little variability. Percentiles and interpercentile ranges can 
be readily approximated from the curve. For example, we can quickly 
determine the approximate value of Pgo for the distribution of reading 
scores, Figure 5.3, by going up the percentile scale to 90, horizontally 
across to the curve, and vertically downward to the scale of scores. 
We strike the latter scale at about 46; thus, 46 is roughly the point 
below which 90 per cent of the 293 scores fall. Similarly, we would 
determine Po as about 22, so that Poo — Po is about 24, which is 
quite close to the results obtained earlier by the arithmetic procedures 
of formula (4.1). 

When the cumulative percentage curves of two or more distri- 
butions are constructed on the same axes, various comparisons are 
facilitated. The frequencies, cumulative frequencies, and cumulative 
percentages of distributions A, B, E, and G of Table 3.1 are shown 
below, and the curves are plotted in Figure 5.4. 


A B E G 

SCORE | f CUM f CUM % | f CUM f CUM % | f CUM f CUM % | f CUM f CUM % 
51-53 1 35 1000 

48-50 ] 34 97:1 

45-47 3 33 94.3 1 32 100.0 
42-44 8 30 85.7 0 31 96.9 
39-41 | 1 23 100.0 4 22 62.9 1 3 96.9 
36-38 | 2 22 95.7 8 18 51.4 $ 30 93.8 
33-35 | 9 20 87.0 2 18 100.0 3 10 28.6 2 27 84.4 
30-32 | 4 11 47.8 5 16 88.9 2 7 20.0 4 25 78.1 
27-29 | 4 7 30.4 gy. du 61.1 3 5 14.3 4 21 65.6 
24-26 | 2 3 13.0 0 4 222 1 2 5.7 3 17 53.1 
21-23 | I 1 4.3 2 4 222 0 1 2.9 2 14 43.8 
18-20 0 2 11.1 1 1 2.9. 2 12 37.5 
15-17 1 2 11.1 4 10 31.2 
12-14 1 1 5.6 2 6 18.8 
9-11 1 4 12,5 
6-8 0 3 9.4 
3-5 3 3 9.4 


The curves permit an astonishing number of comparisons between 
the distributions. To point out a few, the interquartile ranges for 
each distribution can readily be approximated by dropping vertical 
lines from the points where the twenty-fifth and seventh-fifth per- 


Transformations of Scores / 99 


centile lines intersect the curve to the scale of scores and reading the 
distance thus demarcated. Other interpercentile ranges can be deter- 
mined similarly. The percentage of scores in the distributions that 
fall below any given score can quickly be estimated. For example, 
roughly 95 per cent of the scores in 4 and G and 100 per cent of the 
scores in B fall below 38.5, which is approximately the median score 
in E. The student is asked to make several other comparisons in 
Exercise 8. 

When graphic comparisons are desired, cumulative percentage 
curves are extremely useful, perhaps the most useful and informative 
of the various graphic devices. They permit comparisons regardless 
of the size of the distributions and answer questions regarding overlap 
that are accurate enough for many purposes. Moreover, there are 
methods available for determining whether the curves of two sample 
distributions differ sufficiently to discredit the hypothesis that the parent 
distributions are alike. (See Ref. 44.) 


Standard Scores 
If the mean and standard deviation of a series are known, it is 


possible to express the deviation of any score from the mean as a 
multiple of the standard deviation. When a score is expressed in this 


manner, it is commonly called a standard or z score. Symbolically, 
= EM, (5.3) 


z 


7j 
the score. Since the deviation 


the mean always being subtracted from ' 
definition (5.3) may be written 


X — M commonly is represented by x, 
Z — x/s. 

To transform a set of raw scores to standard scores, we need only 
to find the mean and standard deviation of the set and divide the 
respective deviations of the scores from their mean by the standard 
deviation. In the A series of Table 3.1, for example, in which M — 
31.7 and s — 4.2, a raw score of 37 has the standard or z score equiv- 
alent of (37 — 31.7)/42 or about 1.3; and a raw score of 24 has 
the z score equivalent of (24 — 31.7)/4.2 or about — 1.8. The z score 
+1.3 means merely that its raw score equivalent is 1.3s above the 
mean, and the z score —1.8 means that its raw score equivalent is 
1.8s below the mean. In short, a standard score indicates how many 
Standard deviations the corresponding raw score is from the mean 


Of the series. 


100 / Statistics in Education and Psychology 


INTERPRETATION AND USE OF STANDARD SCORES 


Standard scores have several advantages in statistical theory and 
practice. They are algebraic and hence are tractable in mathematical 
discussion. Since a standard score is derived by dividing a deviation 
from the mean by the standard deviation, both of which are in the 
same unit, it is an abstract quantity, i.e., a quantity independent of 
the original measurement unit. The mean and standard deviation of 
any series of standard scores are O and 1, respectively, a fact which 
the student is asked to prove in Exercise 19. As our study proceeds, 
we shall find that the use of standard scores simplifies many statistical 
procedures. 

Standard scores are widely used in testing. When a distribution of 
test scores is normal or approximately so, standard scores reveal a 
great deal of information. A z score of 0 indicates a raw score at the 
mean; a positive z score, a raw score above the mean; and a nega- 
tive z score, a raw score below the mean. A z score of 3.00 is very 


TABLE 5.2 


Percentages of Scores Falling Below Selected z 
Scores in a Normal Distribution 


PERCENTAGE PERCENTAGE PERCENTAGE 
z LM or scores |, X — M or scores X — M or scores 
c BELOW Z "uw BELOW Z * € BELOW Z 
—3.0 1% —1.0 15.9% +1.0 84.1% 
—29 2 = 9 18.4 +1.1 86.4 
—2.8 3 — 8 21:2 +1.2 88.5 
—2.7 4 — 1 24.2 +1.3 90.3 
—2.6 5 — 6 27.4 +1.4 91.9 
=25 6 — 5 30.8 +1.5 93.3 
—24 8 — 4 34.5 +1.6 94.5 
—2:3 1.1 = 3 38.2 +1.7 95.5 
—2.2 1.4 = 2 42.1 +1.8 96.4 
—24 1.8 — .1 46.0 +1.9 97.1 
—2.0 2.3 0 50.0 +2.0 97.7 
—1.9 29 Td 54.0 4-24 98.2 
—1.8 3.6 T2 57.9 +2.2 - 98.6 
—1.7 4.5 + 3 61.8 +2.3 98.9 
—1.6 5:5 + 4 65.5 +2.4 99.2 
=1.5 6.7 + 3 69.2 +2.5 99.4 
—1.4 8.1 + .6 72.6 +2.6 99.5 
—1.3 9.7 + .7 75.8 +2.7 99.6 
—1.2 11.5 + 8 78.8 +2.8 99.7 
—1.1 13.6 + 9 81.6 +2.9 99.8 
+3. 99.9 


Transformations of Scores / 101 


exceptional, since it is 3s above the mean; and a z score of —3.00 
is very exceptional, since it is 3s below the mean. Less than .3 per 
cent of the scores in a normal distribution deviate from the mean by 
as much as +3s. In Table 5.2, the percentages of scores in a normal 
distribution falling below the indicated z scores are shown. Thus, 
z scores in a normal distribution can easily be transformed to per- 
centile ranks, and vice versa. (See pp. 115-116.) The student should 
study Figure 4.1, Table 5.2, and Table C, Appendix, until he can 
reconcile the three. 

In the nonnormal distribution, the ratio of a deviation from the 
mean to the standard deviation, unlike the ratio in a normal distribu- 
tion, does not have exact meaning in terms of areas or percentages. 
However, it is an observable fact that the ratio tends to correspond to 
"normal" percentages except in markedly nonnormal distributions. 
The standard score transformation may be applied to the majority of 
distributions encountered in psychological testing without serious 
distortion. 

Whether percentile ranks are preferable to standard scores isa 
question for which there is no generally satisfactory answer. In deciding 
which of the two to use, their limitations and advantages have to be 
weighed in view of the purpose of the tests and the distributions of 
raw scores. 

The percentile method is easier to apply and percentile ranks are 
more readily interpreted and more widely understood. The percentile 
method is applicable to distributions of any shape. As we have seen, 
however, percentile ranks are not proportional to raw scores; and the 
disproportionality may be severe in the upper and lower tenths of the 
distribution. This disadvantage does not hold for standard scores. 
Standard scores can be averaged, although, since a composite score 
based on several tests rarely provides as much information as the 
separate scores, this tends to be of little importance in psychological 
testing. On the other hand, standard scores tend to lose meaning asa 
distribution departs from normality. It would seem that neither has 
unqualified superiority over the other. Percentile ranks are generally 
adequate in practical work. In statistical theory, however, standard 


Scores are by far more useful. 


CHANGING STANDARD SCORES TO POSITIVE WHOLE NUMBERS 


w scores to standard scores results in 


The transformation of ra e 
In order to eliminate 


decimal numbers, some of which are negative. 


102 / Statistics in Education and Psychology 


decimals and negative signs, standard scores frequently are multiplied 
by one constant and added to another constant. A widely used scheme 
is the one in which the standard scores are multiplied by 10 and added 
to 50. 

Standard scores which have been multiplied by 10 and added to 
50 are usually designated by a capital Z, so that 


Z = 102 + 50, (5.4) 


in which z is defined by (5.3). When raw scores are normally dis- 
tributed, their Z score equivalents are identical to the well-known 
McCall T scores, as will be shown in Chapter VI. The mean and 
standard deviation of a set of Z scores are 50 and 10, respectively. 

Various other methods of expressing standard scores are used. 
Some of these do not eliminate decimals, but do eliminate negative 
signs by adding a constant, such as 3 or 5, to each z score. The VAT 
and MAT scores of Table A, Appendix, are subsets of large sets of 
standard scores which have been multiplied by 100 and added to 500. 
This particular form of the standard score is used extensively by the 
Educational Testing Service of Princeton, New Jersey. Designating 
this score Z’, we may write 


Z' = 100z + 500, (5.5) 
where z is defined by (5.3). 
Z and Z' scores are interpreted by changing them back to z scores 
and referring to Table C. For example, if Z' — 450, z — (450- 
500)/100 — —.5. According to Table C, a z of —.5 corresponds to a 
percentile rank of 31, assuming a normal distribution. The relation- 
ships among selected scores and percentile ranks are shown below. 


Er M wes Tr A PERCENTILE RANK 
.(- n ) Ze I0z-F:50y. Z7 C 1002-4500) (ASSUMING NORMALITY) 


2.5 75 750 99 
2.0 70 700 98 
1.0 60 600 84 

5 55 550 69 

0 50 500 50 
= 5 45 450 31 
—1.0 40 400 16 
—2.0 30 300 2 
—2.5 25 250 1 


CONCLUDING REMARKS 


The purpose of transforming raw scores to percentile ranks or 
standard scores is to make the scores from different tests and other 


Transformations of Scores / 103 


instruments comparable. As a rule, the transformations are not 
desirable in comparing the averages of two or more groups of in- 
dividuals. Such comparisons tend to be more exact and trustworthy 
when raw scores are used. 


EXERCISES 


1. An individual has a percentile rank of 18. Why can it not be said that he 
is in the lowest quartile? 

2. Many college grading systems are based on 100, with 65 as the passing 
grade. What is the difference between a percentage grade in this system 
and a percentile rank? 

3. What is indicated by the relatively steep part of the cumulative percentage 
curve? The relatively flat parts? For what kind of distribution would the 
cumulative percentage curve be a straight line? 

4. Ten compositions have been ranked in respect to originality from 1, most 
original, to 10, least original. What is the percentile rank of each? 

5. Determine arithmetically the 75th and 25th percentiles of the distribution 
in Table 5.1 and compare the values with those obtained from the curve 
in Figure 5.3. 

6. The percentile norms for the standardized reading test used in obtaining 
the distribution shown in Table 5.1 and Figure 5.3 are as follows: 
Pa = 25, Pio = 33, P25 = 38, P50 = 44, Pz5 = 52, Poo = 57, and 
Pos — 64. How do the percentiles in the observed distribution compare 
with the norms? 

7. By use of the curve in Figure 5.3, find approximately (a) the median of the 
distribution, (b) the score exceeded by 95% of the group, (c) the per- 
centage exceeding a score of 25, and (d) the percentage falling below a 


score of 43. 


8. Referring to Figure 5.4: (a) Is it possible to tell roughly which of the four 


distributions is the most variable? (b) What do the irregularities in the 
curves indicate? (c) About what percentage of the B distribution is above 
Pso of G? (d) About what per cent of the G distribution is below P25 


of E? (e) A student having a percentile rank in School E of 30 would 


have about what percentile rank in A? In B? InG? (f) Estimate Q and D 


for each of the four distributions. (g) About what percentage of scores 
fall in the interval Mdn += Q in each of the distributions? 

9. The distributions of the scores of 50 poor, 50 average, and 50 good prob- 
lem solvers on the Raven Matrices test are shown below. Construct 
cumulative percentage curves of the distributions on a common scale. 
(a) About what percentage of the poor problem solvers are above P75 
of the good? Of the average? (b) About what percentage of the good 
problem solvers are below P25 of the average? Of the poor? 


104 / Statistics in Education and Psychology 


PROBLEM SOLVERS 


SCORE POOR AVERAGE GOOD 
55-59 1 2 
50-54 2 3 E) 
45-49 3 9 10 
40-44 7 9 13 
35-39 10 15 12 
30-34 12 7 5 
25-29 9 4 2 
20-24 3 2 1 
15-19 3 

10-14 1 


10. An investigator found that in an observed distribution a standard score 
of .5 corresponded to a percentile rank of 74. He concluded that the 
distribution was non-normal. Was he correct? 

11. The distribution of raw scores and corresponding percentile ranks of 67 
dental school applicants on a spatial relations test are shown below. 
The mean and standard deviation of the distribution are 20.9 and 2.4, 
respectively. (a) Find the z, Z, and Z' scores corresponding to the raw 
scores. (b) By use of Table C, Appendix, find the percentile ranks corre- 
sponding to the z scores. Why do these differ from the actual percentile 
ranks shown? (c) Show that the frequency polygons of the raw and 
standard scores are similar in shape. (d) What are the percentile score 
differences and the standard score differences which correspond to the 
raw score differences: 15-14, 18-17, 21-20, 25-24? 


PERCENTILE 

SCORE bà RANK 
26 1 99.3 
25 4 95.5 
24 5 88.8 
23 1 79.8 
22 10 67.2 
21 10 522 
20 15 33.6 
19 5 18.7 
18 4 11.9 
17 3 6.7 
16 2 3.0 
15 0 1.5 
14 1 A 


12. What are the advantages of transforming raw scores to percentile ranks? 

13. What are the advantages of transforming raw scores to standard scores ? 

14. Is it possible for a z score of 1.5 to represent the highest score in a series? 
Is it possible for a z score to have a value greater than 3? Explain. 

15. The mean and standard deviation of a series of heights are 64.0 in. and 
2.0 in., respectively. What is the standard score in height of an individual 


16. 


17. 


18. 


19. 
20. 


Transformations of Scores / 105 


whose height is 60 in.? What would be his standard score if heights were 
measured in feet? 

The means and standard deviations of the heights and weights of a group 
are 68.0 in. and 2.5 in. and 150.0 Ibs. and 10.0 Ibs., respectively. If an 
individual's height is 72 in. and his weight 180 Ibs., why is he “heavier 
than tall" with respect to the group? 

In the above distribution, an individual has a standard score in height of 
1.5 and a standard score in weight of .5. What are his height in inches and 
his weight in pounds? 

A student has a standard score, Z, of 32 in a distribution of scores in 
which M equals 75.0 and s equals 7.5. What is his raw score? 

Show that the mean of a set of z scores is 0 and the standard deviation 1. 
Show that if each z score in a set is multiplied by K and the product added 
to H, the mean of the resulting scores is H and the standard deviation K. 


CHAPTER VI 


The Normal Curve 


At various places in the preceding chapters, reference has been made 
to the normal distribution as a type of frequency distribution which 
many sets of observational data tend to approximate. The student 
will find it helpful to review pp. 35-36 at this time. 

The normality of data is a concept of great usefulness in statistical 
theory and practice, and no student can use and interpret statistics 
successfully without some understanding of the normal curve. It is 
no exaggeration to say that the “normal law,” namely, the greater a 
deviation from the mean or expected value in a series the less frequently 
it occurs, is the very foundation of statistical theory. 


The Normal Curve as a Limiting Form 


The normal curve may be thought of as the limiting form of the 
frequency polygon of normally distributed data. When we group 
such data in smaller and smaller classes, their frequency polygon 
resembles more and more the normal curve. Consider the 400 normally 
distributed scores of Table B, Appendix. In Table 6.1, the 400 scores 
are shown grouped in class intervals of 9, 7, 5, and 3. The frequency 
polygons of the four distributions are plotted in Figure 6.1. When we 


106 


The Normal Curve / 107 


TABLE 6.1 


400 Normally Distributed Scores Grouped by 
Intervals of 9, 7, 5, and 3 


SCORE f SCORE f SCORE f SCORE f 
69-71 1 
66-68 1 
63-65 3 
60-62 5 


68-72 1 57-59 10 
63-67 4 54-56 15 
65-71 3 58-62 11 51-53 24 
63-71 3 58-64 13 53-57 26 48-50 32 
54-62 30 51-57 43 48-52 49 45-47 40 
45-53 96 44-50 86 43-47 69 42-44 45 
36-44 138 37-43 110 38-42 80 39-41 48 
27-35 96 30-36 86 33-37 69 36-38 45 
18-26 30 23-29 43 28-32 49 33-35 40 

9-17 5 16-22 13 23-27 26 30-32 32 
9-15 3 18-22 11 27-29 24 
1317 4 24-26 15 

8-12 1 21-23 10 


18-20 $5 

15-17 3 

12-14 1! 

9-11 1 

TOTAL 400 400 400 400 


inspect them we note that the smaller the grouping interval or, what 
amounts to the same thing, the greater the number of sides of the 
frequency polygon, the more nearly the polygon resembles the normal 
curve, If we had a very large number of continuous normal scores, 
we could make the grouping interval as small as we please and still 
have frequencies for each interval. By making the grouping interval 
Smaller and smaller, we might approximate the smooth normal curve 
to any desired degree of exactness. For this reason, the normal curve 
may be thought of as the limiting form of the frequency polygon of 
normally distributed data. It follows of course that the relationships 
between frequencies and area in the frequency polygon will hold in 


the normal curve. 


Areas and Frequencies Under the Normal Curve 
that proportions of the area under 


In earlier chapters we have seen ion | 
tandard-deviation or z-score in- 


the normal curve correspond to S 


108 / Statistics in Education and Psychology 


AA 
LALA 


Fig. 6.1. Frequency polygons of normally distributed data grouped by in- 
tervals of 9, 7, 5, and 3. (From Table 6.1.) 


tervals on the base line. For example, .683 or 68.3 per cent of the area 
is subtended by, or corresponds to, the interval M =+ 1, while 95.4 
per cent corresponds to the interval M + 2. 

The area and z-score relationships in the normal curve underlie 
a great deal of statistical method, and we shall consider them at some 
length. After they are understood, we shall find it easy to apply the 
normal curve to various kinds of problems. 


PROPORTIONS OF AREA BETWEEN GIVEN ORDINATES 
OF THE NORMAL CURVE 


The proportions of area under the normal curve included by the 
ordinate at the mean (z — 0) and the ordinates at .01 distances from 
the mean are given in Table C, Appendix. The left-hand column of 
the table gives z values to tenths; the second decimal place in z is to 
be found across the top of the table. If we wish to know the area be- 
tween the ordinate at z — 0.00 and the ordinate at z — 1.96, we go 
down the left-hand column to 1.9, over to the column headed .06, and 
read .4750. The area in question is shown in Figure 6.2. Since the 


The Normal Curve / 109 


0.4750 


0 1.96 
z Scale 
Fig. 6.2. Proportion of area under normal curve between ordinates at z — 0 
and z — 1.96. 


total area under the curve to the right of the ordinate at z — 0.00 is 
.5000, the area to the right of the ordinate at z = 1.96 is .0250. 
Now suppose that we wish to know the proportion of area under 


the curve between the ordinates at, say, z = —1.00 and z = +1.00. 
From the table we find the area under the curve between the ordinates 


at z = 0.00 and z = 1.00 to be .3413. Since the same proportion of 


area lies between the ordinate at z = 0.00 and z = —1.00, the ordi- 


nates at z = —1.00 and z = +1.00 include 3413 + .3413 or .6826 


of the area, as shown in Figure 6.3. 

Now suppose that we wish to find the area under the curve be- 
tween the ordinates at z = .50 and z = 2,50. The area between the 
ordinates at z = 0.00 and z = 2.50 is .4938 and the area between 
the ordinates at z = 0.00 and z = .50 is .1915. Hence, the area 
between the ordinates at z = .50 and z = 2.50 is .4938 — .1915 or 


-3023, as shown in Figure 6.4. 


1.00 0 +1.00 
z Scale 


Fig. 6.3. Proportion of area under norma 
—1.00 and z = +1.00. 


| curve between ordinates at z = 


110 / Statistics in Education and Psychology 


Fig. 6.4. Proportion of area under normal curve between ordinates at z = .50 
andz — 2.50. 


If it is desired to express proportions of area as percentages we 
may, of course, merely multiply by 100. Thus, .3023 of the area may 
be expressed as 30.23 per cent of the area. 


DETERMINING THE INTERVAL WHICH INCLUDES 
A GIVEN PROPORTION OF AREA 


We may use Table C to determine the z values of the ordinates 
that include specified proportions or percentages of the area under 
the standard normal curve. For example, suppose we wish to deter- 
mine the z values of the ordinates that include the middle 50 per cent 
or .5000 of the area. Since the middle .5000 is specified, it follows that 
.2500 will lie on either side of the ordinate at the mean. Hence, we 
find the proportion in the body of Table C that is nearest to .2500. 
This is .2486, which corresponds to a z value of .67, and .67 is the best 
approximation we can make without interpolation. To two-figure 
accuracy, then, the z interval which includes the middle .5000 of the 
area is —.67 to +.67. By linear interpolation we would obtain .67 14/31 
or .6745 to four-figure accuracy. (See Figure 6.5.) 

It is suggested that the student work Exercises 2 and 3 at this 
time so that he will become better acquainted with the construction 
and use of Table C. 


RELATIVE AND ABSOLUTE FREQUENCIES FROM AREAS 


Owing to the correspondence of increments of area under the 
standard normal curve to relative frequencies in intervals, Table C 
might just as well be titled, “Relative Frequency of Normally Dis- 
tributed Scores Between the Mean and Given z Distance from the 
Mean." In a normal distribution .3413 of the scores lie in the interval 


The Normal Curve / 111 


⁄ 


-0.6745 +0.6745 
z Scole 


Fig. 6.5. The z values of the ordinates which include the middle .5000 of the 
area under the normal curve. 


bounded by z = 0.00andz = — 1.00; .3413 lie in the interval bounded 
by z — 0.00 and z = +1.00; .5000 lie in the interval bounded by 
z = —.6745 and z = +.6745; and so on. Such facts as these were 


anticipated in Chapter IV. 

Obviously, for a given distribution, the relative frequencies may 
be changed to absolute frequencies by multiplying by the total fre- 
quency W. For example, if there are 1,000 scores in a normal dis- 
tribution, 1,000 X .3413 or about 341 lie in the interval bounded by 
z = 0.00 and z = 1.00; 1,000 X .3023 or about 302 lie in the in- 
terval z — .50 to z — 2.50 (cf. Figure 6.4); and so on. 

In order to use the areas of Table C in determining frequencies in 
specified intervals on the scale of normally distributed scores, we 
must always work with standard or z scores. Suppose, for example, 
that we have a set of 500 normally distributed scores, and that the 
mean and standard deviation of the set are 100.00 and 15.00, re- 
spectively. In order to determine the number of scores lying between, 
say, 88 and 130, we must first change 88 and 130 to z scores. Since 
the mean is 100.00 and the standard deviation is 15.00, the z scores 
Corresponding to 88 and 130 are —.80 and +2.00, respectively. Turn- 
ing to Table C, we find that the proportion of scores lying between 
—.80 and the mean is .2881 and the proportion between the mean 
and +2.00 is .4772. Hence, the proportion of scores between —.80 
and 2.00 is .7653, and the number of scores is .7653 X 500 or 382.65. 


(See Figure 6.6.) 


FITTING A NORMAL CURVE TO A GIVEN FREQUENCY DISTRIBUTION 


The procedure for fitting a normal curve to a given frequency 
distribution involves primarily the calculation of the frequencies which 


112 / Statistics in Education and Psychology 


zScore -0.80 0 2.00 
Rawscore 88 100 130 


Fig. 6.6 Number of scores falling between 88 and 130 in a normal distribution, 
N = 500, M = 100.00, and s — 15.00. 


would be observed in the classes of the distribution, if the scores were 
in fact distributed normally. 

The procedure is illustrated in Table 6.2. Although the table is 
largely self-explanatory, a few remarks may be helpful. The table is 
arranged to facilitate the calculation of the proportions of area or 
relative frequencies between the real limits of the respective classes. 
After these are determined they are multiplied by 138, the N in the 
present example, to obtain the absolute theoretical normal frequencies 
in classes. The bottom class interval is considered to extend from 
— to 389.5 and the top interval from 749.5 to +% on the scale of 
Scores. The z values of the real class limits shown in the table are 
found, of course, by subtracting the mean from the limits and dividing 
by the standard deviation. Thus the z value of 599.5 is (599.5 — 
532.11)/79.32 or .60. The proportions in column 5 are obtained from 
Table C, and those in column 6 are the successive differences between 
the proportions in column 5. Finally, the proportions are multiplied 
by 138. These products, shown in the last column of the table, are the 
expected or theoretical frequencies in similar classes of a normal 
distribution in which N — 138, M — 552.11, and s — 79.32. If the 
student sketches a normal curve and indicates the z values of the real 
class limits, he will find it easy to follow the computations laid out in 
Table 6.2. (See the partial sketch of Figure 6.7.) 

The observed frequencies of column 2, Table 6.2, may be graph- 
ically compared with the theoretical normal frequencies of the last 
column by means of frequency polygons or histograms. The former 
tend to bring out more clearly the extent to which the given distribu- 
tion is fitted by the normal curve. In making the graph, a frequency 


The Normal Curve / 113 


TABLE 6,2 


Computation of Theoretical Normal Frequencies 
in Class Intervals of a Given Distribution in 
Which N = 138, M = 552.11, s = 79.32 

( Distribution of MAT Scores, Table A, Appendix) 


cg un PROPORTION | THEORETICAL 

REAL Z VALUE OF š OF AREA NORMAL 
woe f | CLASS | REAL CLASS UE AA, FREQUENCY 

LIMIT LIMIT REAL CLASS BETWEEN IN CLASS: 

LIMIT CLASS LIMITS | AA X 138 

+= | +e .5000 
m» dim ww mmis 
690-719 | 3 | 2133 21 EU 0244 337 
660-689 | 3 | 9523 La sez 10467 6.44 
630-659 | 12 | 6523 133 i2 10750 10.35 
600-629 | 13 | 9293 oe 3305 “1108 1529 
570-599 | 17 | 2995 Es 2257 1386 19.13 
540-569 | 18 | 2022 E. 9054 1507 20.80 
510-539 | 18 | 2295 | Z% 0656 ‘1418 19.57 
480-509 50985 | —- Eo ‘1158 15.98 
oa | Aas =p 3212 RO TOS 
4x44 | 16 | 95 | -12 pole ‘0510 7.04 
300-419 | 2 | 4195] -18 poe 0273 3.77 
E 2 | 389.5 | -205 .4798 pa 206 

389 | 1| a =100 .5000 š à 
sum — | 138 1.0000 138.01 


0.0202 | 0.0273} 0.0510 


Real class limit 389.5 419.5 449.5 
z Value -205 -167 -1.29 


Fig. 6.7. Proportions of area or relative frequencies in bottom three classes 


of the normalized distribution of Table 6.2. 


114 / Statistics in Education and Psychology 
20r 


Frequency 
E 
— 


S> 


n — . —— A 
374.5 434.5 4945 554.5 614.5 674.5 734.5 
404.5 464.5 524.5 584.5 644.5 704.5 764.5 


Score 


Fig. 6.8. Frequency polygon and fitted normal curve. (From Table 6.2.) 


polygon of the given distribution is first drawn in the usual manner, 
then the normal curve is drawn on the same axes, as shown in Figure 
6.8. Since the bottom and top classes of the normalized distribution 
are of unspecified length, the curve must be extended free-hand beyond 
the second from bottom and second from top class midpoints. The 
intermediate points of the curve are, of course, plotted from the 
theoretical normal frequencies. 

In a later chapter we shall make use of the differences between 
observed and theoretical normal frequencies in the classes of a sample 
distribution in testing the assumption of population normality. 


Uses of the Normal Curve 


The normal curve has its most important application in sampling 
problems; in fact, it is the very foundation of statistical probability 
and sampling theory, as we shall see in Chapter X. At this point, 
we shall consider a few of the many practical uses of the normal curve. 
In general, whenever we are dealing with a variable which appears to 
be normally distributed, or which we are willing ‘to assume would be 
normally distributed if we could measure it more precisely, normal 
curve properties may be used to advantage. Although the curve is 
mathematically complex, its application to practical problems will 
present little difficulty if the relationships between z scores and areas, 
as given in Table C, Appendix, are understood. 


The Normal Curve / 115 


USE OF THE NORMAL CURVE AS A MODEL 
FOR DISTRIBUTING CATEGORICAL RATINGS 

The normal curve is perhaps best known to teachers and students 
through its use as a model for distributing school marks and other 
categorical ratings. The use is based upon two assumptions: (1) that 
the variable being rated—e.g., achievement in English—is normally 
distributed on a continuous scale, and (2) that the categories cover 
known intervals on the continuum. If a five-category marking scheme 
A, B, C, D, E, and equal intervals are employed, and if the practical 
limits of the z scale are considered to be —2.50 to +2.50, each interval 
will extend 1.0 standard deviation units. Under the two assumptions, 
the distribution of five-category marks would follow the proportions 
shown in Figure 6.9. Since neither assumption is necessarily sound in 
a given class, the normal curve as a model for the distribution of 
marks should be used discriminatingly. The justification of its use lies 
in the frequently observed tendency of reliable measures of the achieve- 
ment of a group of students to approach normality. But, as has been 
noted earlier, every assumption of normality needs to be carefully 
examined before normal curve theory and technique are applied. 

The “stanine” (standard nine) scoring scale, introduced by the 
American Air Force during World War II, illustrates an interesting 
use of the normal curve as a model for transforming test scores to a 
set of single-digit scores. (See Ex. 7.) 


STANDARD SCORES AND THE PERCENTILE SYSTEM 


Since the standard deviation is the unit of measurement em- 
ployed in the normal curve, the relationship between standard scores 
and percentiles in a normal distribution is the same as that between 
the z scores and proportions indicated in Table C. Hence, percentiles 


2 Scale —2.50 -1.50 -0.50+0.50 41.50 +2.50 


Fig. 6.9. The normal curve as a model for the distribution of categorical 


ratings. 


116 / Statistics in Education and Psychology 


and percentile ranks in a given normal distribution are easily deter- 
mined by a table of normal curve areas. The procedure really in- 
volves nothing new; in the preceding chapter we made use of the 
relationship in interpreting standard scores. (See Table 5.2.) 

Suppose we have a normal distribution in which M — 60.00 and 
s = 15.00 and that we wish to find Pogo, i.e., the point below which 
90 per cent of the scores lie. Since 50 per cent of the scores lie below 
the mean, we enter Table C at .3997 (as close as we can come to .4000) 
and find the corresponding z score to be 1.28. Since 1.28 corresponds 
to a raw score of 1.28 X 15.0 + 60.00 or 79.2 in the given distri- 
bution, Pgo = 79.2. 

Now suppose we wish to find the percentile rank of a score of 40 
in the same distribution. The z score equivalent of 40 is (40 — 60.00)/ 
15 or —1.33. Entering Table C at 1.33, we find the proportion of 
scores below the point to be .5000 — .4082 or .0918; hence, a score 
of 40 in the given distribution has a percentile rank of approximately 9. 

Since few if any observed distributions are normal, the results 
obtained by the method illustrated above are generally somewhat in 
error, but, unless the distribution is markedly nonnormal, the errors 
tend to be negligible in practical work. 

When there is reason to believe that a population distribution of 
scores is normal or nearly so and it is desired to construct a table 
of percentile norms from the information obtained by giving a test 
to a sample, the method may give somewhat more stable norms than 
those obtained by direct computation from the actual distribution of 
scores in the sample. 


TRANSFORMING QUALITATIVE DATA 


It is frequently useful to transform qualitative data into numerical 
scores. One of the commonest ways of making the transformation is 
that of assigning convenient small numbers to each given quality or 
category, as is done when the numbers 5, 4, 3, 2, 1 are assigned to the 
school marks A, B, C, D, E, respectively. This method is based upon 
the assumption that the numerical differences between categories are 
equal. 

In situations where a qualitatively ordered variable can be con- 
sidered to be normally distributed, the normal curve provides a con- 
venient and rational method of quantifying the observations of the 
variable. Suppose that motivation in a group of 40 students has been 
rated by an instructor, with results as shown in Table 6.3. If the 


The Normal Curve / 117 


TABLE 6.3 
Instructor's Ratings of 40 Students on Motivation 


STUDENTS RECEIVING RATING 


RATING 

NUMBER PROPORTION 
A—Highly Motivated 10 .250 
B—Effectively Motivated 9 .225 
C—Usually Motivated 14 .350 
D—Vascillating 4 .100 
E—Purposeless 3 .075 


assumption is reasonable that motivation is normally distributed in 
the given group, we may consider the proportions as segments of the 
normal curve, as pictured in Figure 6.10. Our problem now becomes 
that of determining an average value for each of the segments cor- 
responding to the proportions. We cannot take the midpoints of the 
intervals at the bases of the segments, since the distribution repre- 
sented by a segment is not ordinarily symmetrical. 

In practice, either the medians or the means of the segments may 
be taken as numerical averages or scores corresponding to the cate- 
gories. The median of a segment is easily found by determining the 
z value of the ordinate that bisects the segment. Let us find the medians 
of the two left-hand segments shown in Figure 6.10. Since 1/2 of the 
area of the .075 segment is .0375, we need only determine the z value 
of the ordinate to the left of which .0375 of the area lies to find the 
median of the .075 segment. Hence, we enter the table of areas, 
Table C, at .5000 — .0375 or .4625 and find the corresponding z to 
be 1.78 to which a negative sign attaches. (Why?) To find the median 
of the .100 segment, we must determine the z value of the ordinate to 


E| D c B A 
0.075|01100| 0.350 0.225 0.250 


Fig. 6.10. Normal curve segments demarcated by proportions. (From Table 
6.3.) 


118 / Statistics in Education and Psychology 


the left of which .075 + .100/2 or .1250 of the area lies. Hence, we 
enter Table C at .3749, which is as close as we can come to .3750, and 
find the corresponding z to be — 1.15. The other medians and the 
organization of work for computing medians are shown in Table 6.4. 
The procedure for finding the median of a segment of the normal 
curve may be stated: 

a. Add one half of the proportion represented by the given segment to 
the total of proportions to the left of the segment. 

b. Find the difference between the sum and .5000. (If the sum is less than 
.5000, the sign of the median will be negative; if greater than .5000, the 
sign of the median will be positive.) 

c. In Table C find the value of z which corresponds to the difference and 
attach the proper sign. 


TABLE 6.4 
Computation of Normalized Median Values of 
Instructor's Ratings of 40 Students 
(Data from Table 6.3) 


PROPORTION 

RATING PROPORTION BELOW CLASS 

OR RECEIVING PLUS 1/2 PRO- MEDIAN 
CLASS RATING PORTION IN CLASS 

A .250 .8750 1.15 

B .225 .6375 35 

C .350 .3500 — 39 

D .100 .1250 —1,15 

E .075 .0375 —1.78 


The z value thus found is the median of the given segment. It is, of 
course, possible to work from the upper or right end of the scale, if 
desired. The student will encounter little difficulty in finding medians, 
particularly if he first sketches the normal curve and the segments 
corresponding to the given proportions. 

The procedure for finding the mean of a segment of the normal 
curve is considerably more involved than the procedure for finding 
the median, and we shall not describe it here. (See Ref. 24, p. 297.) 
In practical work the median usually is quite adequate. 


COMBINING QUALITATIVE DATA 


The transformation described above is frequently useful in com- 
paring or combining sets of qualitative data, such as judges' ratings, 
letter grades, and categorical ranks on a scale. 


The Normal Curve / 119 


It is well known that judges tend to differ substantially in their 
ratings of a group of individuals, with respect both to the rating of 
any given individual and to the proportions of the group placed in 
different categories. If it can be assumed that the rated variable is 
normally distributed, it is possible to use the transformation described 
above to arrive at an average rating for each individual, thus making 
allowance for varying standards of rating or degrees of leniency on 
the part of the judges. The transformed ratings may be summed and 
averaged like any scores. (See Ex. 9.) 

It should be noted that in many cases, perhaps the majority, 
the ratings of judges are hardly reliable enough to warrant refined 
treatment, the variation in the ratings itself being presumptive evidence 
of unreliability. However, whether or not it improves the reliability 
of average ratings, the method of combining ratings described above 
has three distinct advantages. First, it yields a quantitative average; 
Second, it does not presuppose equal differences between categories; 
and finally, it makes allowance for varying standards of rating on the 
part of the judges. 

Strictly speaking, in combining qualitative data it would be 
better to use the normalized mean values of the segments or categories, 
Since the mean values are algebraic in nature, but the refinement 
ordinarily is unwarranted in view of the nature of the original data. 


NORMALIZING NUMERICAL DATA 


The normal curve may be used to normalize numerical as well as 
qualitative data. Consider the problem-solving test scores of 293 
Subjects in Table 6.5. The distribution is multimodal, platykurtic, and 
Positively skewed. Let us assume that the form of the distribution is 
Not due to sampling fluctuations, or better, let us suppose that quite 
Similar distributions of scores were observed when the same test was 
Biven to other groups of similar subjects. 

It is sometimes the case, when the distribution of the scores on a 
test is stable but lacking in normality, that a more discriminating test 
Tesults if the raw scores are normalized. 

. The procedure in normalizing numerical data is exactly like that 
In normalizing qualitative data by determining median values of 
Ordered categories, and the underlying assumptions are the same. 
Returning to Table 6.5, if the raw scores were ordered by letters instead 
of numbers, the correspondence between the two procedures would 
Stand out clearly. In Table 6.5, as in Table 6.4, the work proceeds 


120 / Statistics in Education and Psychology 


TABLE 6.5 


Computation of Normalized Scores of 293 Subjects 
on a Problem-Solving Test 


CUM f BELOW SCORE VALUE OF Z IN 
PLUS 14 FREQUENCY NORMAL CURVE 
RAW CUM AT SCORE CORRESPONDING y z 


SCORE f y NO. PROPORTION TO PROPORTION SCORE SCORE 
29 2 293 292 .9966 2.71 77 73 
28 1 291 29015 9915 2.39 74 72 
27 2 290 289 9864 2.21 72 70 
26 3 288 28614 9778 2.01 70 69 
25 4 285 283 9659 1.82 68 67 
24 8 281 277 .9454 1.60 66 66 
23 14 273 266 .9079 1.33 63 64 
22 6 259 256 .8737 1.14 61 63 
21 13 253 2461 8413 1.00 60 61 
20 9 240 235% .8038 .86 59 60 
19 14 231 224 -1645 72 57 58 
18 9 217 21215 -7253 .60 56 57 
17 13 208 20115 .6877 49 55 55 
16 6 195 192 .6553 40 54 54 
15 17 189 18014 .6160 30 53 52 
14 15 172 1641 .5614 BE 52 51 
13 19 157 147V$ .5034 01 50 49 
12 13 138 13114 .4488 = 33 49 48 
11 19 125 115% 3942 = 27 47 46 
10 15 106 981% .3362 — 42 46 45 
9 14 91 84 2867 == 44 43 
8 14 77 70 .2389 = A 43 42 
7 18 63 54 .1843 — 90 41 40 
6 10 45 40 .1365 —1.10 29 39 
5 6 35 32 .1092 21.23 38 37 
4 11 29 2315 .0802 —1.40 36 36 
3 3 18 1614 .0563 —1.59 34 34 
2 7 15 1114 .0392 —1.76 32 33 
1 4 8 6 -0205 — 2.04 30 31 
0 4 4 2 .0068 —247 25 30 


from the lower or left end of the scale, but it is possible to work from 
the upper or right end if desired. 

In the preceding section, we saw how a normal curve could be 
fitted to a given distribution. It is instructive to compare that pro- 
cedure with the procedure in transforming numerical data. In the 
former, the frequencies in classes are made equal to the frequencies 
which would obtain if the distribution were normal; in the latter, the 
Scores are made equal to values which normalize the distribution. 


The Normal Curve / 121 


In other words, in the former the frequencies are adjusted; in the 
latter the scores are adjusted. 


T SCORES AND Z SCORES 


The z values of normalized scores, such as those shown in Table 
6.5, always involve negative signs and decimals. Since these are in- 
convenient to work with, z values usually are multiplied by one con- 
stant and added to a second constant. When a median value of z ina 
segment of the normal distribution is multiplied by 10 and added to 
50, the resulting score is universally known as a “T score."* 

Let us look at the column headed “T Score" in Table 6.5. Since 
the median z value of the proportion of subjects scoring 29 on the test 
is 2.71, the T score is 10(2.71) + 50 or 77, rounded off to the nearest 
whole number. The student can verify the other T scores in the column. 
Thus, if T scores were used instead of raw scores in the problem- 
Solving test, the 2 top subjects would have scores of 77 instead of 29, 
the 4 lowest subjects would have scores of 25 instead of 0, the 6 sub- 
jects at raw score 16 would have scores of 54, and so on. 

It is interesting to compare T scores with Z scores. It will be re- 
called from Chapter V that a Z score is defined by 


Z = 10z + 50, 


in which z is a standard score in a distribution, whether the distribu- 
tion is normal or otherwise. (Throughout the present chapter, we have 
been concerned with z scores in the special case of the normal dis- 
tribution.) 

In order to determine the Z scores corresponding to the raw 
Scores shown in Table 6.5, we must first determine the mean and 
Standard deviation of the raw scores. The student can verify that 
these are M = 13.36 and s = 6.66. Hence, the Z score corresponding 
to the raw score 29 is 10(29 — 13.36)/6.66 + 50 or about 73; the 
Z score corresponding to the raw score 28 is 72, and so on, as entered 
in the last column of Table 6.5. Were it not for rounding, the dif- 
ferences between successive Z scores would be constant. 

As was noted in Chapter V, the mean of a set of Z scores is 50 
and the standard deviation 10. However, the conversion to Z scores 


does not change the form of the distribution, and the units on the 
ER eae d 

* The term T score was originated by McCall (Ref. 31) in honor of Thorndike 
and Terman, pioneers in the application of statistics to educational measurement. 


122 / Statistics in Education and Psychology 


Z scale are proportional to the raw score scale throughout. In the 
special case of the normal distribution, Z scores are identical with 
T scores; the more the given distribution departs from normality the 
more marked are the differences between the two. 

Since T scores are based on the z scores of the normal curve, the 
T scores of two or more distributions are always comparable and 
combinable. Z scores, however, are comparable and combinable only 
if the distributions are normal or show similar departure from normal- 
ity. Moreover, T scores can always be interpreted without error in 
terms of percentile ranks; Z scores can be so interpreted only if the 
distributions are normal or nearly so. 


NORMALIZING RANKS 


The transformation of a set of ranks to normalized scores is a 
special and simple case of transforming numerical data. Consider a 
set of ranks, 1, 2,..., 20, in which 1 is the highest and 20 the lowest. 
Since the frequency at each rank is 1, the proportion receiving each 
rank is 1/20 or .05. Hence, the proportion below 18, for example, is 
.0500 4- .0500 4- .0250 or .1250. According to Table C, the z value 
corresponding to .1250 at the lower end of the curve is —1.2. This 
gives a T score of 10(—1.2) + 50 or 38. In a set of 20 ranks, then, 
the rank 18 has a T' score of 38. In Exercise 11, the student is asked to 
find the T scores corresponding to the other ranks in a set of 20. 

The transformation of scores and ranks to T scores is sometimes 
known as the 7-transformation and the procedure as T-scaling. 


SUMMARY 


The normal curve has a great many practical applications in 
measurement and research. It is a convenient model for distributing 
categorical ratings and for transforming raw scores to small, whole 
numbers. It can be used in transforming and combining qualitative 
data and in normalizing numerical data. Various derived scores, such 
as T scores, are based upon the z and area relationships of the normal 
curve. 

In general, when a variable is normally distributed, normal curve 
properties can be utilized in refining gross measures of the variable. 
It should be kept constantly in mind, however, that if a variable is not 
normally distributed, the use of the curve in refining gross measures 
not only is unwarranted but actually introduces a source of error. 
Every assumption of normality needs to be carefully scrutinized. 


The Normal Curve / 123 


EXERCISES 


. The successive terms of the binomial (1/2 + 1/2)” give the theoretical 
relative frequencies of n, n — 1, n — 2,...,2, 1, 0 heads in tossing n 
coins. For example, if 4 coins are tossed, the relative frequencies of 
4, 3, 2, 1, 0 heads are 1/16, 4/16, 6/16, 4/16, 1/16, respectively. If 4 coins 
were tossed 16 times, theoretically 4 heads and 0 tails would occur once; 
3 heads and 1 tail, four times; 2 heads and 2 tails, six times; 1 head and 
4 tails, four times; and O heads and 4 tails, once. What would it mean 
to say that the normal curve is the limiting form of the histogram of the 
binomial distribution? (See Ref. 54, pp. 177-179, for an algebraic proof 
that the normal curve is the limiting form of the binomial distribution.) 

. By use of Table C, determine the proportion of area included by the 
ordinates at the following z values. Sketch a curve and crosshatch the 
area in each case. 


— «-0.00 — m-—1.96 
0.00-+ =° +1.96-+ © 
—2.00-+2.00 —1.96-4-1.96 
— 3.00- 4- 3.00 +0.20-+-2.30 

0.00-+-2.58 —0.60—4- 1.10 
—1.64-4-0.00 —1.70- —0.40 


. By use of Table C determine the z values (to hundredths) of the ordinates 
of the standard normal curve which include: 


a. The middle 25 per cent of the area. 
b. The middle 80 per cent of the area. 
c. The middle 95 per cent of the area. 
d. The middle 99 per cent of the area. 


- The mean and standard deviation of a normal distribution of 500 scores 
are 75.0 and 12.5 respectively. 


. How many scores lie between 50 and 100? 


a 
b. How many scores lie below 62.5? . 
C. What interval on the scale of scores includes the middle 250 scores? 


What are the values of P25, P75, and Q? 
d. What deviation from the mean will be exceeded by 7 per cent of the 


Scores? 
€. What deviation from the mean will be exceeded by 93 per cent of the 


Scores? 
f. If a score were selected at random from the 500, what are the chances 


that it will fall in the interval 62.5-87.5? That it will fall below 50? 
That it will fall above 100? 


- Given a normal distribution in which N — 300, M — 42.00, and s 
7.50, 


124 / Statistics in Education and Psychology 


6. 


10. 


a. How many scores will fall in the class whose real limits are 39.5 and 
44.5? 

b. How many scores will fall in the class whose upper real limit is 54.5? 

The distribution of the Miller Analogies Test scores of 64 psychology 

students is shown below. (a) Verify that the mean and standard deviation 

are 68.50 and 9.76, respectively. (b) Fita normal curve to the distribution. 

Save the results for later reference. 


SCORE f SCORE Ed 
88-91 2 64-67 16 
84-87 3 60-63 10 
80-83 6 56-59 3 
76-79 6 52-55 8 
72-75 4 48-51 3 
68-71 8 


. The stanine scoring scheme consists essentially of assigning the numbers, 


1, 2,..., 8, 9, to percentages of area falling within successive z-intervals 
under the normal curve. One of the commonest schemes makes use of the 
intervals, © to —1.75, —1.75 to —1.25, —1.25 to —.75, —.75 to —.25, 
—.25 to .25, .25 to .75, .75 to 1.25, 1.25 to 1.75, 1.75 to «o. (a) What 
percentages (to the nearest whole numbers) of a group would receive the 
various stanine scores? (b) What assumption underlies stanines? 
(c) When should stanines not be used ? 


. The mean and standard deviation of the distribution of reading scores in 


Table 5.1 are 34.55 and 8.85, respectively. Use the area and z-score 
relationships of Table C to find (a) the percentiles Poo and P10, and (b) 
the percentile ranks of the scores 28 and 38. (c) Compare these with the 
percentiles and percentile ranks obtained arithmetically. (d) Under what 
conditions might the former be preferable to the latter? 


- The ratings of 20 children on resistance to authority, as assigned by three 


psychologists, are given below, A indicating strong resistance, E little 
resistance and B, C, and D, intermediate degrees. (a) What are two ways 
of combining the ratings? (b) Compare the combined rating of a child 
who receives C from the first psychologist, A from the second, and B from 
the third with that of a child who receives C from the first, B from the 
second, and A from the third. 

PSYCHOLOGIST 


RATING FIRST SECOND THIRD 
A 2 8 2 
B 2 4 2 
c 10 4 6 
D 4 2 4 
E 2 2 6 


The distribution of the raw scores of 47 subjects on the Picture Comple- 
tion Subtest of the Wechsler-Bellevue Intelligence Scale is shown below. 


The Normal Curve / 125 


(a) Find the mean and standard deviation of the distribution, and the Z 
Scores corresponding to the raw scores. (b) Normalize the data and find 
the corresponding T scores. (c) Why do the Z and T scores disagree? 


SCORE # 
15 4 
14 10 
13 11 
12 11 
11 6 
10 1 
9 2 
8 1 
7 1 


11. Twenty drawings were rated on unusualness or originality from 1, most, 
to 20, least. (a) Normalize the ranks and find the corresponding T scores. 
(b) What is assumed in converting ranks to T scores? 

12. Suppose that it is known that a large number of sample means are dis- 
tributed normally about a mean value (mean of means) of 40 with a 
Standard deviation of 2. What proportion of the means are 38 or less? 
42 or more? Between 38 and 42? 36 or less? 44 or more? Between 36 
and 44? 


CHAPTER VII 


Correlation and Regression 


It is the fundamental faith of science that the world is under- 
standable. In the search for understanding of man and his affairs, 
the most frequent question that arises concerns possible relation- 
ships between phenomena. Such questions as whether parental in- 
come is related to child continuation in school, whether test intelligence 
is related to academic success, whether supply of economic goods is 
related to price, whether broken homes are related to delinquency, 
are inevitable and endless. The explanation and prediction of natural 
and social phenomena necessarily rest upon demonstrable relation- 
ships. Variables which are not related to one or more other variables 
are of little importance. The central task of any branch of science is 
that of discovering and measuring relationships through comparisons 
of sets of data. As new relationships are found, understanding of the 
world is increased; when existing relationships permit prediction of 
events, control over the environment is extended. 

We can imagine primitive man accidentally discovering, say, that 
fertilizing his plants increased their yield and that the more fertilizer 
he added, up to a point, other factors being favorable, the more the 
increase in yield. We can imagine him discovering that those among 


126 


Correlation and Regression / 127 


his fellows who learned one thing easily tended to learn other things 
easily as well. Some time in the distant past he discovered that certain 
things were associated with his well-being, others with sickness. His 
welfare and the progress of his society depended upon finding out how 
things were associated or related to other things. Knowledge of re- 
lationships has always been the key to understanding and controlling 
the environment. 


STATISTICAL CORRELATION 


The term correlation is loosely used to refer to any sort of relation- 
Ship between objects or events. Such phrases as “the correlation 
between crime and poverty” and “the correlation between fact and 
theory" refer to relationships. In statistics, however, the term correla- 
tion refers exclusively to relationships between variables that can be 
quantified. The situation in which statistical correlation is applicable 
is always one in which there is a pair of measures for each individual 
Or instance in a given group. In order to apply the method of corre- 
lation to determine whether frustration and aggression are related, 
for example, we must have measures of the two for a number of in- 
dividuals; in order to determine whether amount of unemployment and 
retail sales are related, we must have records of the two ina number 
of instances. If the relationship is such that large values of one variable 
tend to be associated with large values of the other, the correlation 
is positive; when large values of one tend to be associated with small 
values of the other, the correlation is negative. When data consist of 
Pairs of measures, they are technically known as bivariate dat 
the two variables comprising bivariate data are correlated, ei 
be spoken of as the correlative of the other. 

Since statistical correlation provides a quantitative method by 
Which relationships can be investigated, it is a most useful tool in 
Social research. 


a. When 
ther may 


CORRELATION IN THE SOCIAL SCIENCES 


In the physical sciences the correlations between many phe- 
nomena tend to be perfect, or nearly so. Thus, the expansion of mercury 
is so highly correlated with heat that, for ordinary ranges of tem- 
Perature, the mercury thermometer serves as à reliable instrument for 
measuring heat. As temperature increases, the volume of mercury 
Increases proportionately. The amount of silver deposited by an elec- 
tric current in a unit of time is, in theory, perfectly correlated with the 


128 / Statistics in Education and Psychology 


strength of the current. In the physical sciences the correlatives of a 
particular variable are usually so marked that they can be determined 
by ordinary observation or by experiment, and the correlation so 
nearly perfect that it can be stated as a law. 

In the biological and social sciences, correlations are much less 
marked and are nearly always weakened by exceptional instances, 
even when the variables can be satisfactorily measured. For example, 
weight of men is a positive correlative of height of men, but the cor- 
relation is weakened by “short and heavy” and “tall and light" men. 
The correlation between amount of money in circulation and employ- 
ment, between rainfall and crop yield, between chronological age and 
mental development, between test intelligence and school marks, 
although significant, is far from perfect. 

The student may well ask, if social phenomena tend to show quite 
imperfect relationships, why correlation is used in social research. 
If the alternative to statistical correlation were controlled experimenta- 
tion, the investigator would be indeed foolish to employ the former. 
But the alternative in many situations is guessing or the intuitive 
process of analysis known as judgment. Fisher emphasizes the useful- 
ness of the method of correlation in the words: 


No quantity has been more characteristic of biometrical work than 
the correlation coefficient, and no method has been applied to such 
various data as the method of correlation. Observational data in par- 
ticular, in cases where we can observe the occurrence of various possible 
contributory causes of a phenomenon, but cannot control them, has been 
given by its means an altogether new importance. ... 

One of the earliest and most striking successes of the method of cor- 
relation was in the biometrical study of inheritance. At a time when 
nothing was known of the mechanism of inheritance, or of the structure 
of the germinal material, it was possible by this method to demonstrate 
the existence of inheritance, and to “measure its intensity"; and this in 
an organism in which experimental breeding could not be practiced, 
namely, Man. By comparison of the results obtained from the physical 
measurements in man with those obtained from other organisms, it was 
established that man's nature is not less governed by heredity than that 
of the rest of the animate world. The scope of the analogy was further 
widened by demonstrating that correlation coefficients of the same mag- 
nitude were obtained for the mental and moral qualities in man as for 
physical measurements. 

These results are still of fundamental importance, for not only is in- 
heritance in man still incapable of experimental study, and existing 
methods of mental testing are still unable to analyze the mental disposi- 
tion, but even with organisms suitable for experiment and measurement, 


Correlation and Regression / 129 


it is only in the most favorable cases that the several factors causing 
fluctuating variability can be resolved, and their effects studied, by 
Mendelian methods.* 


In correlation studies in the biological and social sciences, we 
ordinarily must observe many instances or cases of the related phe- 
nomena and then take a sort of average measure of the relationship 
present; the more instances, of course, the more reliable the measure. 
Statistical correlation refers to the average amount of relationship 
between two variables determined by investigation of a number of 
instances or cases of the relationship. 

Most variables in social research are enormously complex, in the 
sense of having numerous correlatives. It is generally the case that 
the greater the number of variables that are associated with a given 
variable, the less the latter tends to be correlated with a single one. 
When this is the case, there is little possibility of exerting sufficient 
controls to permit experimental study of the relationship between 
two variables. At best, only the general tendencies can be observed. 


The Product-Moment Coefficient of Correlation 


The most widely used and best measure of correlation is the 
Product-moment coefficient, developed by the English statistician, 
Karl Pearson, about 1900. The nature of the coefficient can be brought 
Out clearly as we consider a practical problem. . 

Suppose we wish to determine the amount of relationship between 
reading and vocabulary in, say, the eighth grade and suppose that we 
have pairs of scores for a group of pupils, as shown in Table 7.1. I 

Inspection of the table indicates that the pupils who have high 
Scores in the reading test tend to have high scores in the vocabulary 
test. This tendency stands out clearly when the 18 pairs of scores are 
Plotted as dots in the scatter diagram of Figure 7.1. For the most 
Part, scores above the mean in one test are paired with scores above 
the mean in the other, and scores below the mean in one with scores 
below the mean in the other. Moreover, the dots tend to fall in a 
linear pattern extending from the lower left corner to the upper right 
Corner of the figure. (If they fell in a straight diagonal line, of course, 
Perfect correlation would be indicated.) 

E SS 


a Reprinted from R. A. Fisher, Statistical 
Published 1950 by Oliver and Boyd, Ltd., Edinburg! 


Methods for Research Workers, 
h, by permission, pp. 175-176. 


130 / Statistics in Education and Psychology 


TABLE. 7.1 


Scores of 18 Eighth-Grade Pupils on Tests of 
Reading Comprehension and Vocabulary 


READING VOCABULARY PRODUCT OF 
PUPIL X r PAIR XY x y? 
1 57 52 2,964 3,249 2,704 
2 48 43 2,064 2304 1849 
3 48 32 1,536 2,304 1,024 
4 47 40 1,880 2,209 1,600 
5 40 44 1,760 1,600 1,936 
6 39 42 1,638 1,521 1,764 
7 39 32 1,248 1,521 1,024 
8 39 30 1,170 1,521 900 
9 37 30 1,110 1,369 900 
10 35 24 840 1,225 576 
11 34 26 884 1,156 676 
12 32 27 864 1,024 729 
13 30 30 900 900 900 
14 30 23 690 900 529 
15 28 22 616 784 484 
16 26 19 494 676 361 
17 24 34 816 576 1,156 
18 24 26 624 576 676 
SUM 657 576 22,098 25,415 19,788 
MEAN 36.5 32.0 
55 
Quadrant II Quadrant | 
E 
45r : 
+= 40r . 
Ë 
2 35r x 
2 
2 Q * 
Fl 
S 30+ . P 
sp 7 r^ 
20 | . 
15| Quadrant lll L Quadrant IV 
0*39 25 30 35 40 45 50 35 60 
Reading X 


Fig. 7.1. 


Graphical presentation of paired raw scores. 


(From Table 7.1.) 


Correlation and Regression / 131 


a ien q describe the amount of correlation characterizing the 
ien » We need a measure that is sensitive to the extent to which high 
En in reading are paired with high scores in vocabulary, inter- 
m dee. with intermediate, and low with low, i.e., the extent to which 
s m series of scores vary together. Let us see how such a measure 
n be derived from the sum of products of pairs of deviation scores. 


THE SUM OF PRODUCTS OF DEVIATION SCORES 

MEASURE OF CORRELATION 

m n raw scores of Table 7.1 are shown as deviation scores in 
ios e 72. It will be noted that in each case the mean has been sub- 
Fais from the score, as is always done in converting raw scores to 
| ace Scores. The products of the pairs of deviation scores with 

eir algebraic signs are shown in column 4 of the table. 
TABLE 7.2 
Deviation Scores of 18 Eighth-Grade Pupils on 
Tests of Reading Comprehension and Vocabulary 


READING VOCABULARY PRODUCT SQUARE 
PUPIL x y xy xi y 
1 +20.5 +20.0 +410.0 420.25 400.00 
2 +11.5 +11.0 +126.5 13225 121.00 
3 +11.5 0.0 0.0 132.25 0.00 
4 +10.5 + 8.0 + 84.0 110.25 64.00 
5 + 3.5 +12.0 + 42.0 12.25 144.00 
6 + 2.5 +10.0 + 25.0 625 100.00 
7 + 2.5 0.0 0.0 6.25 0.00 
8 + 2,5 — 20 - 50 6.25 4.00 
9 + 5 = 20 = 10 25 4.00 
10 EX — 8.0 + 12.0 2.25 64.00 
11 = 25 — 60 + 15.0 6.25 36.00 
12 = 45 = 50 + 22.5 20.25 25.00 
13 — 6.5 — 20 + 13.0 42.25 4.00 
14 — 6.5 — 9.0 + 58.5 42.25 81.00 
15 — &5 —10.0 + 85.0 7225 100.00 
16 —10.5 —13.0 +136.5 110.25 169.00 
17 —12.5 + 2.0 — 250 156.25 4.00 
18 -12.5 — 60 + 750 15625 36.00 
SUM 0 0 +1,074.0  1,434.5 1,3560 


own at the foot of the table, is a 
h reading scores are associated 
If the association 
the sum would 


NEU the sum of the products, shc 
with s measure of the extent to whic i 
We ocabulary scores of proportional magnitude. 

re stronger, the sum would be increased; if weaker, 


132 / Statistics in Education and Psychology 


be decreased. For example, if the association were made stronger by 
interchanging the vocabulary scores of pupils 3 and 5, the sum of 
products would be 1,170.0 instead of 1,074.0. On the other hand, if 
the association were made weaker by interchanging the vocabulary 
scores of pupils 9 and 15, the sum of products would be 1,002.0 instead 
of 1,074.0. The student should experiment with other favorable and 
unfavorable interchanges until he is satisfied that the sum of products 
of deviation scores is sensitive to the amount of association or cor- 
relation between reading and vocabulary scores. 

In addition to being sensitive to the amount of correlation, the 
sum of products of deviation scores indicates whether the correlation 
is positive or negative. This is clearly seen when we examine the 
deviation scores plotted in the scatter diagram of Figure 7.2. The 
signs of the products of pairs of deviation scores are positive in quad- 
rants I and III and negative in quadrants II and IV. The majority of 
dots lie in the former two quadrants and the sum of products is positive. 
If the majority of dots had fallen in quadrants II and IV, the sum,of 
products would have been negative. This would have been the case 
if reading scores had been inversely associated with vocabulary scores. 
(If the dots had been distributed evenly over the four quadrants, the 
sum of products would have been zero, indicating absence of cor- 
relation.) Thus, the sum of products reflects both the amount and the 


+20} Quadrant II Quadrant | + 


+15} 


= 

o 
= 
. 


+ 
a 


o 
+ 
` 


Vocabulary y 


-5 . 


-15 Quadrant III Quadrant IV 


15 10 -5 0 +5 +0 +15 +20 +25 
Reading x 


Fig. 7.2. Graphical presentation of paired deviation scores. (From Table 7.2.) 


Correlation and Regression / 133 


direction of correlation. Although we have developed this idea with 
reference to a particular set of data, the argument may readily be 
extended to any set of quantitative bivariate data. 

Although sensitive to the amount and direction of correlation, 
the sum of products of deviation scores is limited as a measure of 
correlation because it is independent neither of the size of the group 
nor of the units of measurement. If there had been more than 18 
Pupils in the illustrative problem, the sum of products would have 
been affected, whether or not the additional pairs of scores contributed 
to the amount of correlation. Furthermore, the sum of products 
would have been affected if the scores on the tests had been system- 
atically larger or smaller or if the unit of measurement had been 
different. If, for example, each item in the vocabulary test had counted 
1/2 instead of 1, the sum would have been 537.0 instead of 1,074.0. 
If the reading test had been twice as long and if the pupils had done 
Proportionally as well on the longer test, the sum would have been 
doubled. As a more general illustration, if the heights and weights 
Of a group of men were recorded in the metric system, the sum of 
Products of deviation heights and weights would be quite different 
from that which would be obtained if heights and weights were re- 
Corded in the English system. 

Before the sum of products of deviation scores can be used as a 
general measure of correlation, it is necessary to introduce refine- 
ments which will eliminate the effects of size of sample and unit. of 
Measurement. These refinements are discussed in the following 
Paragraphs, 


AS A MEAN PRODUCT OF STANDARD SCORES 
EASURE OF CORRELATION 

The effect of sample size on the sum of products of deviation 
Scores 3xy can easily be eliminated by dividing by the number Nin 
the sample (the number of pairs of scores) The quotient =xy/N, 
being the mean of the products, is independent of size of sample. 

A If we can now free the quotient Exy/N from the effect of the 
Units of measurement, we shall arrive at à perfectly general measure 
Or coefficient of correlation. In an earlier chapter we learned that a 
Standard score is independent of the unit in which the original measure- 
Ment is made, Since this is true, it remains only to divide Zxy/N by 
$ and s, to arrive at a perfectly general measure of correlation. In 
Other Words, a measure or coefficient of correlation between two variables, 


134 / Statistics in Education and Psychology 


independent of the size of the sample and the units of measurement, can 
be determined by dividing the mean product of the paired deviation 
scores by the standard deviations of the scores. It will be seen that this 
procedure is equivalent to finding the mean product of paired standard 
Scores, although ordinarily the standard scores are not actually 
computed. 

Let us illustrate the procedure by determining the coefficient of 
correlation of the data of Table 7.2. The mean of the products of 
paired deviation scores is 1,074/18 or 59.67. The standard deviations 
are 4/1,434.5/18 and 4/1,356/18 or 8.93 and 8.68. When we per- 
form the successive divisions we obtain .77. Theoretically, .77 is the 
coefficient of correlation we would obtain between reading compre- 
hension and vocabulary, no matter how many comparable eighth- 
grade pupils we measured with the given tests and no matter whether 
we scored correct responses 1 or 2 or 1/2 and so on. Practically, of 
course, the unreliability of the tests and sampling fluctuations would 
affect the coefficient. 


COMPUTATION OF THE PRODUCT-MOMENT 
COEFFICIENT OF CORRELATION 


We may summarize the above procedure in the formula 


= 2. 
te TN (7.1) 


This is the basic formula for the Pearson product-moment coefficient 
of correlation, commonly designated by Fay. Since s, = MExT/N 
and s, = 4/Xy?/N, the basic formula may be written 


Zxy . 
V (Zx2)(2y2) 


Using the latter formula in our example (see Table 7.2 for sums), 
we have 


Pay = (72) 


1,074.0 
V (1,434.5)(1,356.0) 


It is usually the case in practical correlation problems that the 
means of the scores turn out to be decimal numbers. When this is 
the case, the deviation score formulas (7.1) and (7.2) will involve 
rather tedious arithmetic. If we substitute X — X for x and Y — Y 


Correlation and Regression / 135 


for y in formula (7.2), expand, substitute 3X/N and > Y/N in terms 


containing X and Y, and simplify, we obtain the raw score formula 


N3XY — (X)G Y) š (7.3) 
VINEX? — GX)NXY: — EN 


Applying the formula to the raw scores of Table 7.1, where the needed 
Sums are shown at the foot of the table, we have 


"n 18(22,098) — (657)(576) 
"  V[I80S415) — (657)]18(19,788) — (576)?] 


= .77. 


In summary, in computing rz, from deviation scores x and y, the 
data are organized and treated as shown in Table 7.2 and the appro- 
priate substitutions are made in either formula (7.1) or (7.2. In 
computing rry from raw scores X and Y, the data are organized and 
treated as shown in Table 7.1 and substitutions are made in form- 
ula (7.3), 

It should be noted that formula (7.3) is particularly convenient 
for machine calculation. By use of an electric desk calculator, it is 
Possible to get the five sums needed in the formula simultaneously. 
It should also be noted that changing all of the X’s or all of the Y's or 
all of both by a constant amount does not affect rzy. The constant used 
for the X's does not have to be the same as that used for the Y's. 
(See Ex, 11) Thus, the X's and the Y's may be reduced and work 
thereby saved in using formula (7.3). . 

. The student is urged not to follow the computational procedures 
blindly, but to keep in mind the purpose of the procedure, namely, 
that of determining a sensitive measure of the relationship between 
two variables, a measure based on the products of pairs of scores. 


Fry 


THE COMPUTATION OF rey FOR GROUPED DATA 


For several reasons, it may be desirable to group the data ina 
two-way frequency distribution or correlation table when computing 
ey As we shall see in the next section, rry is a fair measure of rela- 
tionship only when the data are linearly related, if related at all. When 
data are grouped in a correlation table, departure from linearity of 
Su icient extent to question the use of rzy can ordinarily be detected 

y Inspection. Moreover, the correlation table or two-way frequency 


136 / Statistics in Education and Psychology 


distribution is often a compact and informative way of presenting 
bivariate data. Finally, unless an electric calculator is available, 
grouping the data before computing rz, may save considerable time. 

Consider the Regents' averages and semester averages of the 
freshmen of Table A, Appendix. There are 76 freshmen who have 
both averages. The first step in making a correlation table is to decide 
on an appropriate grouping scheme for the data. The range of the 
Regents’ averages is from 74.4 to 96.5, so a class interval of 2.0, giving 
12 classes, is appropriate. The range of semester averages is from 54.4 
to 93.6, and this suggests an interval of 3.0, giving 14 classes. After 
the classes are labeled, as shown in Table 7.3, tallies to represent pairs 
of scores are entered in the table. Student 003 of Table A, for example, 
has a Regents' average, X, of 85.2 and a semester average, Y, of 68.6, 
so a tally is entered in the space where column 84.0-85.9 intersects 
row 66.0-68.9. A tally for student 004 is entered in the space where 
column 86.0-87.9 intersects row 72.0-74.9, and so on, until the 76 
pairs of scores are entered in Table 7.3. It will be noticed that the 
frequencies in column 1 at the right of the table and row a at the foot 
sum to 76, the total number of pairs of scores. 

If we assume that the tallies represent scores having mid-values 
of their respective classes, we may code class midpoints of both dis- 
tributions, just as we would if we were going to find the means and 
standard deviations. The coded values of X are shown in row b at the 
foot of the table, and the coded values of Y are shown in column 2 at 
the right of the table. The entries in columns 1, 2, 3, and 4 and in rows 
a, b, c, and d are obtained exactly as in computing the standard de- 
viation for grouped and coded data. Subscripts for f, d, etc., are 
needed, of course, because we are working with two variables, X and Y. 

The entries in columns 5 and 6 and in rows e and f require explana- 
tion. Consider the four tallies in the 87.0-89.9 row. The coded values 
of the first are 1 and 5, as seen in row b and column 2; the coded values 
of the second are 3 and 5; the coded values of the third and fourth are 
4 and 5 and 4 and 5. The sum of the products of the coded values is, 
then, 1(5) + 3(5) + 4(5) + 4(5) or 60. Notice that 5 is a common 
multiplier and that, consequently, we may find the sum of products by 
finding the product 5(1 + 3 + 4 + 4) or 5(12). The entry in column 
5 is merely the sum of the d; values; the entry in column 6 is the prod- 
uct of the sum and the common d, value. 

The entries in rows e and f are similarly explained. Consider the 
three tallies in the column 92.0-93.9. Their coded values are 3 and 5, 


Correlation and Regression / 137 


—— UpS*DZ FTE 00 UM SL 9 < o t o t o 0 se "erp 
uuinjoo 
x0 Gus "pz) wp c s s S g 0 IZ 0 -o 0 9 px Q) 
AU) GPÍD Oly 0S 96 LT ty £t 0 6 y £9 sgi 0 9t £m (p) 
d. oO ro 8 OW v 6 W@W € 0 6 — MH o 9 pf (9) 
(swns 
Cpz"pyg FPZ) (ipfa) py) S # £ £ 1 o le £— tf- t- $ 9 zp (9) 
ve 8 89b vr L t 9 D w e S 6 i bk B o 1 f (0) 
9€ 9— 9€ 9— g— 1 | 69çs-0'rs 
o 0 o 0 s- 0 6'65-0'L6 
91 *t— 9 t- — 1 , š 6'z9-0'09 
9 T= a 6— £— £ 1 , , 6'59-0'£9 
81 6 9t 81— t— 6 if 1 “ “| 1 n 689-099 
L L-— 8 8 — I- 8 un 1 “| / 6 1L-0'69 
0 s= o0 0 o Iz DAM n m i m i 6'vL-0L 
t= t- e e H rai “ m n “ “ 6'LL-0'SL 
91 8 szo pl [4 L 1 m "Hood 6'08-0'8L 
oe 0! se “ZI £ r / 7) 6'£8-0"18 
op 01 8 71 y € “ 1 6'98-0'P8 
09 eu 00 oz s y 1 , , 6'68-0'L8 
09 0I u u 9 z A 6'z6-0'06 
87 r é L L 1 1 6'S6-0'£6 
Mod 616 6S6 6E6 616 668 618 698 GES 618 66L GLL 6SsL 30vuav 
"pap — 7pg ly "y ^p "f -096 -0+6 -0'76 -0'06 -0'88 -0'08 -0'8 -O78 -0'08 -0'8L -0'9L -O'fL 
(9) (ç) (p) (0 @ [1] 


X 30Vi3AV ,SIN3O3H 


A 3DVH3AV WALSINAS 


vlog padno4r) ul u011D]9140O fo 1u2121[[207) 1u2ut0JÀ-12npo4q ay) fo uoipnduio?) 


£'L TTIEVIL 


138 / Statistics in Education and Psychology 


3 and 2, and 3 and —2, as read in row b and column 2. Since the 
common d, value is 3, the sum of products is 3(5 + 2 — 2) or 15. 
The entry in row e is the sum of the d, values in a column, and the 
entry in row f is the product of the sum and the common d, value. 
The student should verify the other entries in columns 5 and 6 and in 
rows e and f. In computing the sums for column 5 or row e, a strip of 
paper graduated in coded units will be found helpful. 
When data are grouped and coded, formula (7.3) becomes 


NZ(dEd.) — (Zfidi)Xfydy) 
V Nzfdi — (2d V NXf,dl — (fyd)? 


Let us apply the formula to the data of Table 7.3. The various 
sums to be substituted in the formula are shown in the lower right- 
hand corner of the table. Substituting, we have 


= 76(314) — 8(44) 
v/16(470) — (89 /76(468) — (44)? 
= .679. 


(7.4) 


Fry 


Vay 


Thus, the coefficient of correlation between Regents’ and semester 
averages, as computed after grouping, is .679. The coefficient com- 
puted before grouping is .684, so here, as is generally the case when NV 
is more than about 30, little accuracy is lost if the data are grouped. 

If the means and standard deviations of the distributions in the 
correlation table are desired, they can be readily computed. By form- 
ula (3.4) the means of the Regents’ and semester averages are, re- 


spectively, 8 
X = 86.95 — (5) 2 = 87.16 
5 44 
Y = 73.45 — (ja = 75.19, 


and by formula (4.9) the standard deviations are, respectively, 
Se = z 76(470) — (8) = 4.97 
6 
Sy = A /76(468) — (44)? = 7.24. 


The student will recall that neither the mean nor standard de- 
viation for grouped data is affected by the position of the arbitrary 


Correlation and Regression / 139 


origin. When the origins are taken at the lowest classes of the two 
distributions in the correlation table, only positive numbers appear. 
The advantage of having only positive numbers to work with, however, 
is paid for by the larger products and sums which result. 

There are a great many commercial blank correlation tables, 
Some of which provide elaborate checks of the correctness of the 
computations. The majority of these are so involved that they tend 
to make the beginning student a blind follower of directions. The 
Setup shown in Table 7.3 provides checks at two crucial points, it can 
be quickly laid out on any cross-section paper, and it is straight- 
forward, convenient, and simple. After he has worked several cor- 
relation problems using the setup as shown in the table, the student 
is advised to drop rows e and f and to check the dyd: column by 
repeating the operations it calls for. All computations without cross- 
checks should, of course, be performed twice to insure accuracy. 

Little more needs to be said regarding the computation of rz for 
grouped data, The work begins with the construction of a correlation 
table in which the X class intervals are indicated across the top and the 

Class intervals at the left, the X scores increasing from left to right, 
and the Y scores from bottom to top. Ordinarily there should be 
between 10 and 20 classes for X and between 10 and 20 classes for Y. 
Each pair of scores is then entered as a tally in the appropriate cell. 

he tallies in the respective rows are summed to give f, and those in 
the columns to give f,. Next, the columns at the right and the rows at 
the bottom are completed as shown in Table 7.3. Finally, the ap- 
Propriate sums are entered in formula (7.4). The student can convince 


himself that 3(d,2d,) in the correlation table is the sum of the products 
ach product and sum- 


Of pai i 
s. Paired coded scores by actually computing € 
ng Over the entire table. 


Th z š 
* Meaning of Correlation 
“hi Such phrases as “low correlation,” “moderate correlation, E 
c igh correlation" have little to commend them save rop were 
e venience, Although the phrases will inevitably persist, it n e 
anderstood that the coefficient of correlation is too complex in Es a: 
ng too diverse in use to permit general interpretation as low, moderate, 
9r high. 
When the correlatives of a given variable are Sous 
nts may be important because they demonstrat 


ht, *low" coef- 


ficie e relationship 


140 / Statistics in Education and Psychology 


where none was believed to exist. Coefficients of zero may be impor- 
tant if they discredit superstitious beliefs regarding the causes of a 
given phenomenon. On the other hand, “high” coefficients may be 
more or less meaningless if the sample is small. As will be shown in a 
later chapter, correlation coefficients are extremely susceptible to sam- 
pling fluctuations when samples are small, and relatively large coef- 
ficients may occur due to chance alone. For example, nearly 15 per 
cent of the random samples of size 10 from a normal bivariate popula- 
tion in which there is no correlation would be expected to yield product- 
moment coefficients of .40 or more. 

The product-moment coefficient, rz, is a sensitive measure of the 
association or relationship between two variables. Since it is inde- 
pendent of the unit of measurement and of the number in the sample, 
it is an abstract measure. A coefficient of + 1.00 indicates perfect posi- 
tive relationship; one of — 1.00, perfect negative relationship. Perfect 
correlations, even in the exact sciences, exist only theoretically. 

The larger the absolute value of rzy, the stronger the relationship 
between two variables. The coefficient is a complex function, however, 
and two coefficients cannot be directly compared. A coefficient of 
.80, for example, indicates more than twice as much relationship than 
one of .40. Later we shall see that the strength of relationship is 
directly related to the square of r,,, and that a coefficient of .80 in- 
dicates a strength of relationship four times that indicated by a coef- 
ficient of .40. A negative coefficient indicates the same amount of 
relationship as a positive coefficient of the same size; however, the 
former indicates inverse relationship, whereas the latter indicates 
direct relationship. 

Just what rzy means and just how it is interpreted are best seen 
in connection with its uses. These include 

1. The estimation or prediction of values of one variable from given 
values of a related variable, or, simply, the prediction of Y from X. 
. The analysis of relationships between two or more variables. 
. The investigation of causal relationships. 
. The control of the effect of one variable on one or more others. 
. The study of the statistical validity and reliability of psychological 
tests. 
. The estimation of the coefficient of correlation in the population. 


7. The adjustment of experimental data from two or more groups for 
initial differences between the groups. 


Un + Un 


[2j 


Several of the uses overlap, and several are sterile unless there is cor- 
relation in the population. 


Correlation and Regression / 141 


We shall consider the first four of the uses in the following sec- 
tions of this chapter and the fifth and sixth in subsequent chapters. But 
first we need to emphasize the conditions under which rzy is a proper 
or fair measure of relationship. 


ASSUMPTIONS UNDERLYING rxy 

The condition under which rz, is a fair measure of relationship is 
commonly called the assumption of linearity. Linearity refers to the 
tendency of the data, when plotted, to follow a straight line as closely 
or more closely than some other curve. Two of the diagrams in Figure 
7.3 illustrate linearly related data; two nonlinearly related data. When 
the data follow some other curve more closely than a straight line, rry 
underestimates the amount of relationship. Although there are statis- 
tical tests for the assumption of linearity, none is particularly powerful, 
and ordinarily nonlinearity sufficiently marked to invalidate the use of 
zy Can be detected by inspection of the scatter diagram or correlation 
table. 

While no assumption relating to the form of the distributions of 
correlated variables underlies ry, most of the uses of rzy do pre- 


Weight Y 
Problem Solving Y 


Height X Reoding Comprehension X 


Score Y 
Score Y 


Chronological Age X Minutes X 


Fig. 7.3. Linearly related data: ( 
(B) Scores on tests of reading comprehension and 


A) Heights and weights of adult males, 
problem solving. Non-linearly 
(5 to 17 years) and scores on an intelligence 


related data: (C) Chronological ages 
and total scores on a mental test. 


test, (D) Speed of response (minutes) 


142 / Statistics in Education and Psychology 


suppose, at some point in their application, that X and Y are dis- 
tributed normally in the population, and the student is cautioned that 
an rz, computed for other than normally distributed data may have 
little usefulness except as a descriptive measure. 


Linear Regression 


Before taking up statistical prediction, we need to consider the 
property of bivariate data known as regression. The fact that cor- 
related data tend to establish a pattern when plotted in the scatter 
diagram or correlation table suggests another way of measuring and 
describing the relationship between two variables, namely, that of 
describing the pattern. The mathematical description of the pattern 
is called the regression equation. Since we are concerned here with 
data that are linearly related, if related at all, the regression equation 
of concern is the equation of a straight line. This is the simplest of 
the patterns and one which characterizes a great many of the bivariate 
data of psychological and educational research. Kelley observes 
(Ref. 25, Preface): 


Many years ago my inspiring teacher, Henry Lewis Rietz, observed 
that statistical procedures derived from the normal distribution were 
born of a higher realm than other procedures. I disbelieved this with a 
religious fervor. Though, as time has passed, I have espoused curvilinear 
regression and skew distributions with gusto, I have found myself fre- 
quently slipping, for the data would not support me, and linear relation- 
Ships and nearly normal distributions have in my experience as a psy- 
chologist cropped up with a frequency which has chided and mocked me. 
I still reserve judgment as to the place of birth of the normal distribution, 
but that its sphere of usefulness is extended in connection with biological 
and psychological phenomena I no longer have the slightest doubt. 


THE REGRESSION EQUATION 
It will be recalled from elementary algebra that the general equa- 
tion of a straight line is 
Y = bX + a, 


in which 5 is the slope of the line (the tangent of the angle the line 
makes with the X-axis) and a is the Y intercept (the value of Y when 
X is zero). When b and a are given, the equation describes one and 
only one straight line. For example, if bis 2 and a is 5, we have 


Y =2X +5. 


Correlation and Regression / 143 


By substituting convenient values, say 2, 3, and 4 for X, we have 
X=2, Y=9; ¥=3, Y ll; X= 4 Y= 13. The points, 
(2, 9), (3, 11), (4, 13) form a straight line, and all values which satisfy 
the equation Y — 2X + 5 will lie on the line. 


THE REGRESSION LINE 


Observational data, like those of Figure 7.4, never behave as 
nicely as mathematical data, and we can, as usual in statistics, deal 
only with the “general tendency." 

What line can we construct which will fairly summarize the linear 
tendency of the dots in Figure 7.4? We might use a ruler, and, by 
inspection, draw a line as close to the various dots as possible. Such 
à graphical method would have some merit, but it would not give 
uniform results, Different individuals would construct different lines. 

There are several mathematical methods by which the equation 
of the line that “best fits” correlated data like those of Figure 7.4 
might be found. The standard method used in statistics is that of 
“least squares.” The “least squares” method is one that determines 
the equation of the line such that the sum of the squares of the vertical 


Semester Average 


2 
88 92 96 


72 76 80 BA 


Regents! Average 


Fig. 74, The line of regression of semester averages on Regents’ averages in 
4 sample of 76 college freshmen. (Data from Table A, Appendix.) 


144 / Statistics in Education and Psychology 


distances of the dots from the line is a minimum. (One of the 76 
distances, d, is shown in Figure 7.4). The general equation of the 
“least-squares” line is, in deviation score form, 


" Zx 
y = Ax (7.5) 
which may also be written 
y = rv SË x. (7.6) 


The quantity Exy/Zx? or rzySy/sz is called the coefficient of regression 
of Y on X and is customarily written b,z. The line whose equation is 
given in (7.5) or (7.6) is called the line of regression of Y on X. 

When the deviation scores are turned back to raw scores, equation 
(7.6) becomes 


Y! = rey 3 (X — X) + Y. (7.7) 


In the above equations the symbols y' and Y' are used instead of 
y and Y because the equations give the values which Y would have for 
given values of X if the data fell exactly in a straight line; i.e., the 
primes indicate the values Y would have if X and Y were perfectly 
correlated. Thus, the Y’ and y” are theoretical values. 

All that is needed to write the regression equation for a given set 
of bivariate data are the means, standard deviations, and the cor- 
relation coefficient. Returning to the data of Table 7.3, we have 
previously found that X = 87.16, Y = 75.19, s, = 4.97, Sy — 7.24, 
and rz, = .679. Substituting in equation (7.7), we have 


+ 
y 497 


= .989X — 11.01. 


.679 (199) (X — 87.16) + 75.19 


This is the equation of the line of regression of Y on X for the data in 
hand. The line is shown in Figure 7.4. The slope of the line or the 
regression coefficient is .989. We shall consider the practical uses of 
the regression equation a little later. 

If, in Figure 7.4, the “least-squares” line had been determined so as 
to make the sum of the squares of the horizontal distances of the dots 
from the line a minimum, its equation in raw score form would have 
been 


x = lay E (Y — ¥)+ X, (7.8) 


Correlation and Regression / 145 


zu is the equation of the /ine of regression of X on Y. The quantity 
n^ sy is called the coefficient of regression of X on Y and is written bzy. 
Eo cd Y on X and the X on Y regression coefficients ad 
ati n important in statistical theory, but in practical work only 
MAD. "e is of concern. When one of the variables is dependent, 
tina de to be dependent, on the other, only one of the regression 
šcholastt as practical meaning. If, for example, the relation between 
wada, aptitude and college grades were being investigated, grades 
diia and be considered the dependent variable, and the regres- 
deria =p es on aptitude would be of concern. It is conventional to 
In tliis e the dependent variable Y and the independent variable X. 
(7.5) Urge the equation of the regression of Y on X, as given by 
-5), (7.6), or (7.7), is the equation of practical concern. 


sc 
THE one ABOUT THE REGRESSION LINE. 
ANDARD ERROR OF ESTIMATE 


riables is not adequately des- 


f data, like those of Figure TS; 
ed by dissimilar 


"S relationship between two và 
yield ba a regression equation. Sets o : 
en regression lines, yet are characteri2 
5 of relationship. 
MIEL related data tend both to 
in sem out the line; hence, both line ani 
men marizing relationship. The need 
helo of both central tendency and ` 
ling" ncy distribution; in fact, the regressior 
as will be brought out in connection Wi 


fall along a straight line and to 
d scatter need to be considered 
is analogous to the need for 
d variability in describing a 
n line is a sort of "mean 
th prediction. 


Fi 
ig. 7.5. Data characterized by similar lines of regression of Y on X, but 


dissimi 
milar scatter about the lines. 


146 / Statistics in Education and Psychology 


Consider again the data of Figure 7.4. For each X there is a given 
Y. There is also a theoretical value, Y”, for each X which would obtain 
if X and Y were perfectly correlated. This is the Y” of the equation 
Y' = .989X — 11.01. The Y”s fall on the regression line, and the 
differences between the Y's and Y”s are the deviations of the Y's from 
theline of regression. These differences or deviations may appropriately 
be called residuals. If we find the standard deviation of the residuals, 
we shall have a measure of the scatter about the line of regression. 

The standard deviation of the residuals is written Sest. or sy.z, and 
is read the standard error of estimate or the standard error of Y inde- 
pendent of X. There is a simple formula for syz: 


Sys = SAI — 12, (7.9) 


Applying the formula to the data of Figure 7.4, where s, is 7.24 and 
ry iS .679, we have 


Sy = 7.241 — (.679), 

so that s,., = 5.32. 

If we actually found the Y”s by substituting successive values of 
X in the equation Y' = .989X — 11.01, subtracted them from the 
corresponding Y's, and found the standard deviation of the differences 
or residuals, we would obtain 5.32. It is to be remembered that the 
standard error of estimate, s,.., is the standard deviation of the residuals 
about the line of regression of Y on X. We shall find several uses for 
Sy.z in later sections. 


The Regression Equation in Prediction 


It is frequently useful to estimate or predict the theoretical score 
of an individual in one variable, given his score in a related variable. 
For example, if it is known that scores on an aptitude test will later 
be correlated with success in a certain occupation, the aptitude test 
scores may be used in predicting the success of given individuals in 
the occupation. 

The idea of prediction, in this sense, naturally follows regression. 
In the prediction situation, the dependent variable Y is customarily 
referred to as the criterion variable and the independent variable X 
as the predictor variable. In the above example, aptitude test per- 
formance is the predictor; success in the occupation the criterion. 
The nature of prediction can be brought out most easily and clearly 
by reference to a practical problem. 


Correlation and Regression / 147 


STATISTICAL PREDICTION 

Let us think about the work of the Admissions Committee at the 
college from which the data of Table A, Appendix, were obtained. 
The first job of the Committee is, of course, that of selecting high 
school graduates who show promise of satisfactory achievement in 
college and screening out those who do not. Each applicant for the 
freshman class submits various evidence in support of his application, 
Such as rank in class, recommendations from his secondary school 
official, College Entrance Examination Board scores, and the like, and, 
if he has attended a New York State high school, Regents’ examina- 
tion scores. 

Any of these variables which is correlated with success in college 
is useful to the Committee in estimating the chances that a given 
applicant, if admitted, will in fact succeed. For example, if appli- 
cants of future years are like the group from which the freshmen of 
Table 7.3 were selected, i.e., are members of the same population of 
potential freshmen, the relationship between their Regents' averages 
and semester averages will be comparable to the relationship existing 
in the data of Table 7.3. The equation of the line of regression of 
semester averages on the Regents' averages is, as derived in the pre- 
ceding section, Y' — .989X — 1101. This regression equation 
gives the best estimate of the semester average that will be made by an 
admitted applicant having a particular Regents' average. By use of 
the equation, we would predict a semester average Y' of 78 for an 
applicant having a Regents' average of 90.0, since we would have 
Y’ = .989 X 90.0 — 11.01. This is the best estimate of the semester 
average which will in fact be paired with a Regents’ average of 90.0, 
provided that what has been true in the past holds true in the future. 
By similar use of the regression equation, the semester average theo- 
retically paired with any given Regents' average can be predicted. 

It will be seen that predicted scores are merely the theoretical 
scores Y” of the preceding section. In any situation in which we have 
been able to determine the relationship between a predictor and a 
Criterion variable, it is possible to use the regression equation in 
Prediction. The important condition underlying the procedure is that 
the individuals for whom predictions are being made are members of 
the same population as the sample in which the relationship was 
Originally determined. It is furthermore necessary that the factors 
affecting performance on predictor and criterion variables remain 
constant, or nearly so. If the conditions are not met, the observed 
relationship cannot be expected to obtain in the future. 


148 / Statistics in Education and Psychology 


ACCURACY OF PREDICTION 


If the correlation between Regents' and semester average scores 
were perfect, the work of the Admissions Committee, considered 
above, would be simple and pleasant indeed. Unfortunately, the 
semester averages scatter about the regression line, so that various 
averages are associated with a particular Regents' average. Can the 
extent of the scatter be estimated and allowed for in prediction? 

The answer is yes, provided we are able to make three assump- 
tions. The first is that the relationship between predictor and criterion 
variables is linear. The second is that the scatter of criterion scores 
in one column of the correlation table is equal to the scatter in any 
other, or would be if enough cases were added to smooth sampling 
irregularities. In our present example, this assumption means that in 
the population of freshmen, past, present, and future, from which the 
76 freshmen are a sample, the semester averages which are, or will be, 
associated with a given Regents' average will show the same scatter 
as those associated with any other. Technically, this is known as the 
assumption of homoscedasticity (homo means “like” and scedasticity 
means “scattering”). If the relationship is linear, of course, like 
scattering in columns results in equal scattering about the line of 
regression. 

The third assumption we must make in order to judge the ac- 
curacy of prediction is that the differences between predicted and 
observed criterion scores (semester averages in our example) will be 
distributed normally in each column. This amounts to the assumption 
that the observed criterion scores will be distributed normally in each 
column. 

Under the assumptions of linearity, homoscedasticity, and nor- 
mality of criterion scores, the standard error of estimate, Sy.2, defined 
in the preceding section, enables us to gauge the accuracy of predic- 
tion. In Chapter VI we learned that standard deviation unit distances 
or z intervals include fixed proportions of normally distributed scores. 
Since Sy.x is the standard deviation of the errors of estimate (differences 
between observed and predicted criterion scores), we may determine 
the proportion or percentage of criterion scores falling within sy,» unit 
distances of their predicted value, and hence the chances that a criterion 
score, yet to be observed, will fall within or outside of some specified 
interval. 

The manner in which criterion scores will be distributed about 
the regression line, if the conditions and assumptions underlying pre- 
diction are satisfied, is depicted in Figure 7.6. About 68 per cent of 


Correlation and Regression / 149 


t2Syx 
95.4% 
H1Syx | of the 
68.3% | scores 


Nom Pan 
if the issu tribution of criterion scores in colum 
nptions of homoscedasticity. normality, 


the er; 
criteri : š 
ion scores in any column will presum 


Of thei 
tially predicted value; about 95 per cen 
on within 3s,,... 
ages core g to the problem of predicting 
2, Wilm egents’ averages where, by previ 
averages may reasonably expect that about 68 pe 
yet to be earned will fall 


Value: , 
? about 95 per cent within +1 


So on 


0.64 of their predicted 


Confiden, percentage of scores is expecte 
ce band for scores yet to be earned, i... futur 


€ let 

Y, represent a future score, We may write 
for Yy: Y' € 15y.z 
Y! + 2Sy.x 
Y! + 3Sy.x 


68% confidence band 
95% confidence band for Yy: 
99.7% confidence band for Yy: 


ns of the corre 
and linearity are satisfied. 


t within 25.25 


d to fall as an approximate 


135 y.x 
99.7% 
of the 
scores 


lation table, 


ably fall within +15y.2 


and prac- 


freshman semester aver- 
ous calculations, Sy.z is 
r cent of the semester 
within +5.32 of their predicted 


value; and 


about the interval in which a 


150 / Statistics in Education and Psychology 


Any particular confidence band for Y; may readily be deduced 
from Table C, Appendix. For example, the 80 per cent band must 
include the middle 80 per cent of the area under the normal curve— 
40 per cent on either side of the center. According to Table C, the z 
score corresponding to .3997, as close as we can come to .40 without 
interpolation, is 1.28 or 1.3. Hence, the 80 per cent confidence band for 
Y, is Y” + 1.3s,.2. By similar procedure, we would find the 90 per cent 
confidence band to be Y" + 1.6s, .. 

Just what confidence band to use in judging or reporting the 
accuracy of predicted scores depends on how sure we want to be that 
a score, yet to be observed, will in fact fall in the band. Often the 
chances that a score will fall below (or above) some given value is of 
more concern than a confidence band. Such chances may be deduced 
from the appropriate confidence band. For example, since 68 per 
cent of the scores yet to be earned are expected to fall within the band 
Y! + ls;z, 32 per cent are expected to fall outside—16 per cent on 
either side of the band. Hence, the chances are 16 in 100 that a score 
will fall at or below Y^ — 15s,.; and 16 in 100 that a score will fall at 
or above Y” + 15. 

It is generally more convenient, however, to estimate the chances 
directly without bothering with confidence bands. Returning to the 
example, we would predict a semester average of 68.1 for an applicant 
having a Regents' average of 80.0, since we would have Y' — .989 
(80.0) — 11.01. If this applicant is admitted, what are the chances 
that he will actually earn a semester average of 60 or less? Remember- 
ing that, for this example, Sy.z is 5.32, we have the z score, z = (60 — 
68.1)/5.32 — —1.52. Consulting Table C, we find that the area 
below — 1.52 is .064. Hence, the chances are 64 in 1000 or about 6 in 
100. In other words, if 1000 applicants were admitted having a Re- 
gents' average of 80.0 and, consequently, a predicted semester average 
of 68.1, 64 would be expected to earn averages of 60 or less. (See 
Figure 7.7.) 

By exactly the same procedures, we can determine the chances 
that any semester average will deviate by more than some specified 
amount from its predicted value, or that it will fall within or outside 
of a given band. 

Although our description of the method of judging the accuracy 
of prediction has been confined largely to a particular sample, the 
method is general, provided the underlying conditions and assump- 
tions are met. It should be noted, however, that the method gives 


Correlation and Regression / 151 


Predicted 
criterion score 


Fig. 7.7. Number of criterion scores in 1,000 expected to fall 1.52 or more 
standard errors of estimate below their predicted value. 


only approximate results. The standard error of estimate, sy2, in a 
given sample tends to underestimate the extent of scatter of the resi- 
duals about the regression line in further samples. The bias can be 
removed by multiplying sy.2 by WN/(W — 2), but there is no simple, 
practical way to make allowance for the sampling errors in the sta- 
tistics used in the regression equation or for the fact that the accuracy 
of prediction of Y from X is not quite the same for all values of X. 
Despite these sources of error, the simple method of judging the 
accuracy of prediction, described above, is accurate enough for prac- 
tical purposes, provided the sample is large, say not less than about 
50. (See Ex. 24.) 

Let us comment, parenthetically, upon a rather common criticism 
Of statistical prediction. The criticism is based upon the fact that, 
although it may be known that a certain percentage of individuals 
for whom predictions are made will in fact make criterion scores 
below (or above) a particular point, there is no way of telling which 
Particular individuals will do so. Such uncertainty, however, is by 
10 means confined to statistical prediction. It perplexes all attempts 
to foretell whether a particular individual will succeed or fail. The 
best that can be done is to state the individual's chances in terms of 
the proportions of comparable individuals who have succeeded or 
failed. The criticism stems from the human desire for certainty in a 
field where none exists. It is impressive only because it suggests that 
When decisions which affect individuals must be made, such as admis- 
sion to a college or to a vocation, the decisions should be made in the 
light of as much relevant information about each individual as can be 
Obtained, in addition to predicted criterion scores. 


152 / Statistics in Education and Psychology 


SUMMARY 


Two conditions are involved in using the regression equation in 
prediction. The first condition is that individuals for whom predic- 
tions are made are similar in all relevant respects to those upon whose 
past performance the regression equation is based; the second, that 
the factors affecting the predictor and criterion scores remain constant, 
or nearly so. 

Three assumptions underlie the estimation of the accuracy of 
prediction: (1) linearity of regression, (2) equal scatter of criterion 
scores in the columns of the correlation table, and (3) normal dis- 
tribution of criterion scores in the columns. The first two assumptions 
taken together mean equal scatter of criterion scores about the re- 
gression line. 

The logic underlying statistical prediction and the estimation of 
the accuracy of statistical prediction includes these points: 

a. Correlation between a criterion and a predictor variable has been ob- 
served in a sample from a specified population. 

b. The relationship will hold true in further samples from the same 
population. 

c. Hence criterion scores, yet to be observed, will in fact pair with 
predictor scores in essentially the same way as they have in the past. 

d. Hence the accuracy of predicted criterion scores can be estimated from 
the extent to which past criterion scores have scattered about the re- 
gression line. 

Since prediction is the forecasting of criterion scores yet to be 
Observed, the soundness of the assumptions can be judged only from 
past experience. Ordinarily they are considered sound if the two 
conditions of prediction are met and if, in the past, regression has been 
linear and observed criterion scores distributed normally. It should be 
emphasized, however, that all conditions and assumptions under- 
lying prediction and the reliability of prediction should be continuously 
scrutinized in the light of the information derived from follow-up 
studies. The matter is well summed up in these words from an editorial 
on psychological tests (Ref. 38): 

All informed predictions of future performance are based upon 
some knowledge of relevant past performance: school grades, research 
productivity, sales records, batting averages, or whatever is appropriate. 
How well the predictions will be validated by later performance depends 
upon the amount, reliability, and appropriateness of the information used 
and on the skill and wisdom with which it is interpreted. Anyone who 
keeps careful score knows that the information available is always in- 
complete and that the predictions are always subject to error. 


Correlation and Regression / 153 


In a later section we shall extend the ideas discussed above to the 
problem of predicting criterion scores from two related variables. 
It is customary in statistical prediction to use at least two independent 
variables. 


Analysis and Interpretation of Relationship 


When rzy is used to analyze the relationship between two variables 
or to predict one from the other, ordinarily its most comprehensive 
interpretation is based on variances and regression theory. 

It will be recalled that the variance is the square of the standard 
deviation and hence is a measure of scatter or variability. Thus, s is 
à measure of the variability of X, and s is a measure of the variability 
of Y. Often in research work we are interested in knowing how much 
of the variance of a dependent variable Y is explained by or accounted 
for by variation in an independent variable X. For example, we find 
"zy between aptitude and achievement to be .45, and we ask how 
important is aptitude in accounting for variation in achievement, i.e., 
We ask to what extent do individual differences in aptitude account for 
individual differences in achievement. Let us see how such questions 
are answered. 


RELATION OF rxy TO VARIANCES 


The standard error of estimate is, by formula (7.9), 


Jal 
Sya = Sy N 1 — riy 


As we have seen, Sy. is the standard deviation of the differences be- 
tween the observed values of Y and the theoretical values obtained 
from the regression equation of Y on X. It is a measure of the dis- 
persion or scatter of the observed Y's about the regression line, and 
hence indicates the extent to which variables other than X are in- 
fluencing y. f 

By squaring both members of the above expression, we obtain 


(7.10) 


2 
sia = sd — rm 


i : i ges 2 : 
in which 52, is the variance error of estimate and s? the variance of 


the observed Y's. Solving for rz, we get 


2 2 
2 — Sya 
Le ze (7.11) 
Y 


154 / Statistics in Education and Psychology 


We may arrive at a second expression for i2, in terms of variances. 
The regression equation of Y on X may be written y. = ry (si/so)x, 
in which y” is the predicted score in deviation form. Squaring, sum- 
ming, and dividing by N, we obtain 


2 
— uA 
N xy?ys 


since Zx?/N = s. Now Zy'?/N is the variance of the predicted 
Scores, i.e., the variance of Y which results from variation of X. Des- 
ignating this variance s;, and solving for r2,, we have 


LET (7.12) 


From equations (7.11) and (7.12) it follows that S = 353p s. 
This is an instructive relationship. It tells us that when X and Y are 
correlated, the total variance of Y is equal to the variance predictable 
from or accounted for by X plus the variance due to factors other 
than X, i.e., the variance not accounted for by X. 

It is evident from (7.12) that the proportion of the total variance 
of Y accounted for by X is equal to the square of the correlation 
coefficient, r2,. Let us apply this fact to the data of Table 7.3. The 
coefficient of correlation between Regents’ averages and semester 
averages is .679. Hence, (.679)? or about 46 per cent of the variance of 
semester averages is accounted for by the variation in Regents’ averages. 
It follows that about 54 per cent of the variance of semester averages 
must be attributed to variables other than Regents’ averages. 

When 2, is used to determine the proportion of variance of the 
dependent variable which can be attributed to variation of the inde- 
pendent variable, it is sometimes called the coefficient of determination 
and the quantity 1 — n. the coefficient of nondetermination. Notice 
that this interpretation of r}, is confined to variances; it does not 
extend to other measures of variability. 

Since the importance of an rz,, in the sense of indicating the 
percentage of explained variance, depends on its square, it follows 
that two or more coefficients compare in importance as their squares. 
Thus, coefficients of .90 and .30 compare as 9 to 1; i.e., a coefficient of 
.90 is 9 times as important as a coefficient of .30, in the sense that it 
indicates 9 times as much explained variance. 


Correlation and Regression / 155 


RELATION OF rxy TO EFFICIENCY OF PREDICTION 


A second interpretation of rz, is possible by reference to the re- 
gression equation and the standard error of estimate, as they are used 
in prediction. From the general regression equation (7.7), we note 
that when r,, = 0, we would obtain a Y” equal to Y for any value 
of X. In other words, if rzy = 0, knowledge of X would be of no 
help in predicting Y. If rsy = =1.00, all of the Y's lie on the regres- 
sion line. In this case, to know X would be to know Y. 

The value of a given rsy in prediction is reflected by the standard 
error of estimate s,.., a fact we have already made use of in judging 
the reliability of predicted scores. By an extension of the concept 
Of s,.. as a measure of the dispersion of the observed Y's about the 
line of regression, we can arrive at a useful and general method of 
judging the predictive value of an rzy. Formula (7.9) obviously may 


be written 
T m m. 7.13 
ET EE 0.13) 


It is evident from (7.13) that the quantity V1 — r2, indicates the 
proportion of total scatter of Y (as measured by the standard deviation) 
remaining in any column of the correlation table, provided the as- 
sumptions of homoscedasticity and linearity are satisfied, and thus 
Measures the extent to which a given rzy aids in prediction. When 
"ey = 0, the ratio s,../s, equals 1 and sy; = sy. In this case the Y's 
Scatter about the regression line as much as they scatter about their 
Own mean. As |r;,| increases, the ratio decreases, although not pro- 
Portionately. When rzy = ==l the ratio is 0, and Sy.z 18 0. This means 
that the observed scores fall on the regression line; i.e., the observed 
and predicted values of Y are identical, so that there is no error of 
estimate, 

The values of V/I — 7, corresponding to selected values of rz 
are shown in Table (7.4). This is an instructive table. It tells us, for 
example, that when |r;| = -866 the ratio of sy. to sy is about 50. 
In other words, when |rzy| = -866 the dispersion of the Y's about 
the line of regression is about 50 per cent as much as their dispersion 
about Y. = 

In general, the quantity V1 — r2, is an index of the predictive 
value of the correlation coefficient. For a practical application, we 
Teturn to the Regents' and semester averages of Table 7.3. For those 
data ray = .68, so that VI — ri, = V1 — (68)? = -73. Hence, 


156 / Statistics in Education and Psychology 


TABLE 7.4 


Selected Values of rz, and 
Corresponding Values of 


1 — ri, 
raj] V 1 — rà |rey| V1 — r£, 
.00 1.000 -80 -600 
-10 .995 .866 -500 
20 -980 -900 .436 
-30 954 925 .380 
40 917 950 312 
-50 866 975 .222 
-60 -800 -990 141 
-70 -714 1.000 -000 


the dispersion of the semester averages about the regression line is 
about 73 per cent as much as their dispersion about their own mean. 

The quantity V1 — 72, is commonly known as the coefficient of 
alienation and is designated by the letter k. The quantity 1 — k is 
sometimes called an index of the efficiency of prediction. 


THE EFFECT OF VARIABILITY OF DATA ON Pry 


One of the most important things to keep in mind in interpreting 
ry is its sensitivity to variability in the bivariate sample. If we were 
to find the correlation between height and weight in a group heter- 
ogeneous in respect to either height or weight, the coefficient would be 
substantially greater than in a relatively homogeneous group. If we 
gave, say, a vocabulary test to a group extremely variable in mental 
age, we would find the correlation between vocabulary and mental 
age greater than in a group less variable in mental age, other things 
being equal. 

As a case in point, consider the correlation between the Re- 
gents’ averages and college freshmen averages shown in Table 7.3. 
The two are correlated with r,, = .68. Now these freshmen were 
selected partly on the basis of Regents’ averages; hence, those ad- 
mitted to college were less variable in Regents’ averages than New 
York State high school graduates at large. Suppose that the standard 
deviation of Regents’ averages, at large, is about 10.0, as compared 
to a standard deviation of about 5.0 for our selected freshmen. Can 
an adjustment or correction be made for the effect of the restriction 
or curtailment of variability upon rzy? 


Correlation and Regression / 157 


The answer is yes, provided we can make two assumptions. The 
first is that the regression of semester averages on Regents’ averages, 
at large, would be linear; the second that the semester averages would 
be homoscedastic or equally scattered in columns. Under these 
assumptions it can be shown that the correlation between X and Y in 
the unselected or larger group may be estimated by 


rez /52) 


V1 — rà, + ri GU 


> (1.14) 


J 
Fry 


in which rsy is the correlation between variables X and Y in the re- 
stricted group; s, is the standard deviation of the restricted group in 
the X variable upon which selection was made; s; is the standard 
deviation of the larger group in the X variable; and rz, is the estimate 
of the correlation which would exist between X and Y in the larger 
group. 

To make use of formula (7.14) in estimating rZ, in our example, 


We substitute the values, rzy = .68, s, = 5.0, 5; = 10.0, and have 


.68(10.0/5.0) — 88. 
vT — (68)? + (-68)2(10.0/5.0)* 


The value .88 is the estimated or theoretical correlation between 
Regents’ averages and semester averages, the effect of selection being 
eliminated. ! 

Formula (7.14) may be used to estimate rzy for a restricted group 
if we know r}, in the larger group and the respective standard de- 
Viations, When used for this purpose, it is more convenient when 
Written 


--— 
Fry 


ri (s=/sz) " (7.15) 
VA — z + rust 


Fry 


Although the correction of rsy for effect of variability is mainly of 
theoretical interest, the effect of variability upon the magnitude of 
"zy is of great practical importance in interpreting rzy. Before coeffi- 
cients of correlation observed in two or more groups can be fairly 
compared, the groups must be comparable in variability. Before we 
Can generalize information based upon an /z, observed in a given 
group to other groups, we must be sure the groups are reasonably 
alike in variability, or “range of talent.” 


158 / Statistics in Education and Psychology 


This means that a reported rzy, like all other statistical measures, 
cannot be interpreted without knowledge of the situation in which it 
was observed. In particular, standard deviations are needed in inter- 
preting r,, and should always be reported. If the standard deviations 
are very different from those ordinarily characterizing the variables 
under consideration, the fact should be emphasized. 


rxy AS A MEASURE OF RATE OF CHANGE 


In standard score form the equation of the line of regression of 
Yon X is 


A 
Zy = MryZx- 


Now this equation summarizes the linear relationship between Y 
and X, the two being expressed in comparable units. When we examine 
it we note that when z; changes, zj, increases, remains fixed, or de- 
creases to an extent that depends entirely upon rzy. Thus r;, indicates 
the amount of covariation which characterizes two variables, when 
they are expressed in comparable units and when regression is linear. 
Since X and Y are rarely, if ever, perfectly correlated, zi, will generally 
be less than z,. For example, if rsy = .50, Z will be only half of zz. 
This fact is related to the “law” of regression, or the regression tend- 
ency. The tendency is extremely important in interpreting correlated 
data, and we shall consider it at some length. 


THE REGRESSION TENDENCY 


Since the time of Galton, it has been recognized that tall parents 
tend to have offspring less tall, and short parents offspring less short, 
than themselves. In Galton's words (Ref. 18, p. 95): 


However paradoxical it may appear at first sight, it is theoretically a 
necessary fact, and one that is clearly confirmed by observation, that the 
Stature of the adult offspring must, on the whole, be more mediocre than 
the stature of their Parents; that is to say, more near to the M [median] 
of the general Population. 


Galton referred to this tendency as the law of regression, and general- 
ized to various hereditary traits in these words (Ref. 18, p. 106): 


The law of Regression tells heavily against the full hereditary trans- 
mission of any gift. Only a few out of many children would be likely to 
differ from mediocrity so widely as their Mid-Parent [average of parents], 
and still fewer would differ as widely as the more exceptional of the two 
Parents. The more bountifully the Parent is gifted by nature, the more 


Correlation and Regression / 159 


rare will be his good fortune if he begets a son who is as richly endowed 
as himself, and still more so if he has a son who is endowed yet more 
largely. But the law is even-handed; it levies an equal succession-tax on 
the transmission of badness as of goodness. If it discourages the ex- 
travagant hopes of a gifted parent that his children will inherit all his 
powers; it no less discountenances extravagant fears that they will in- 
herit all his weakness and disease. 

It must be clearly understood that there is nothing in these statements 
to invalidate the general doctrine that the children of a gifted pair are 
much more likely to be gifted than the children of a mediocre pair. They 
merely express the fact that the ablest of all the children of a few gifted 
pairs is not likely to be as gifted as the ablest of all the children of a 


very great many mediocre pairs. 


The regression tendency is observable in all situations in which 
bivariate data are imperfectly correlated. A group of students of 
Superior academic aptitude are, on the average, less superior in aca- 
demic achievement; a group of students of inferior aptitude are, on 
the average, less inferior in achievement. A group of highly intelligent 
men will be found to be married to less intelligent women, on the 
average, and vice versa. Tall men are, as a group, less extreme in 
Weight than in height, Heavy men or light men are, as a group, less 
extreme in height than in weight. These facts are true because, like 
traits of parents and offspring, academic aptitude and achievement, 
Intelligence of married pairs, and height and weight are imperfectly 
Correlated. In general, imperfectly correlated measures show regres- 
sion toward the mean. This is an important “law” for research workers, 
teachers, and counselors, and we shall illustrate it, using the data of 
Table 7.3. In that table the 11 freshmen, having scores in the column 
Whose mid-value is about 91, have a mean semester average of about 
79. The standard score equivalent of the former is about .77 and that 
Of the latter about .53. In other words, the 11 freshmen are, as a group, 
Nearer the mean of the semester averages than the mean of the Re- 
gents’ scores. We may note the same regression tendency toward the 
mean of the Regent’s scores. For example, the four freshmen having 
Semester averages in the row whose mid-value is about 88.5 have a 
mean Regents’ score of 92.9. The standard score equivalent of the 
former is 1.84 and that of the latter 1.17. : 

It can be shown that the regression tendency always exists when 
If we select a set of equal or nearly 
d values of the related variable 
e former. This tend- 


data are imperfectly correlated. 
equal values of one variable, the paire 
will, as a group, tend to be less extreme than th 


160 / Statistics in Education and Psychology 


ency does not mean that all of the paired values will be less extreme; 
in fact, it may be that one or more will be more extreme. The majority 
of them, however, will be less extreme, i.e., will tend to show regres- 
sion toward the mean, if comparable units are used. 

The regression tendency tells us that we can expect, as a rule, to 
find exceptional individuals in one trait less exceptional in related 
traits. The reason why, say, students of a given degree of academic 
aptitude exhibit, on the average, a more moderate degree of academic 
achievement is not clear, but so long as aptitude and achievement are 
imperfectly correlated, the tendency is inevitable. The higher the 
correlation, of course, the less marked the tendency. 

Failure to take the regression tendency into account frequently 
leads to spurious research findings and to improper interpretation of 
correlated data. Wallis and Roberts (Ref. 51) give examples of various 
mistakes because of failure to take the tendency into account, and 


Thorndike (Ref. 47) discusses at length regression effects in re- 
search problems. 


Correlation and Causation 


The fact that correlation may depend upon the extent to which 
one variable is affected by another squarely brings up the question of 
causation, a question which always arises when relationships are 
observed and one which permeates all science. 

The principle of causality is involved and elusive; we shall make 
no attempt to treat it adequately here. But some of its simpler im- 
plications need to be considered before the student can appreciate the 
power of correlational methods in research. 

: When we use the term cause, we are ordinarily referring to a suffi- 
cient reason for the occurrence of an event, and hence are thinking 
about an orderly and invariable sequence, effect preceded by cause. 
The ordinary interpretation of causal relationship is something like 
this: c and e are related as cause and effect if e occurs after c and if e 
does not occur when c is absent. Hence, when we seek causal relation- 
ships we are really seeking an invariable sequence of events. Such 
relationships are the sine qua non of understanding and controlling 
the environment, since they enable us to explain and predict events. 

The fact of correlation does not demonstrate sequence, and there- 
fore does not indicate which of two related variables is cause, which 
effect. When variable X is correlated with Y, and the correlation is not 


Correlation and Regression / 161 


accidental, there are three reasonable interpretations: (1) X is the cause 
or part of the cause of Y, (2) Y is the cause or part of the cause of X, 
and (3) X and Y are caused or partially caused by some third variable 
or set of variables. Correlation does not indicate which one of the 
three interpretations is sound in a given situation; it demonstrates only 
that X and Y are associated. Inferences regarding the direction and 
nature of causation can be made, if at all, only from information 
supplementary to the fact of correlation. 

In spite of this limitation, however, correlation is extremely useful 
in preliminary investigation of causal relationships. It is generally the 
case that variables which are causally related show correlation and that 
variables which do not show correlation are not related causally. Hence, 
the method of correlation serves both to single out variables which may 
be relevant to an observed effect and to eliminate variables which are 
irrelevant, 


Special Applications of Product-Moment Correlation 


The product-moment method of correlation, although used prin- 
cipally in the case of continuous data, is applicable both to discrete 
quantitative and qualitative data. In this section we shall consider the 
Computation and meaning of the rank-difference, biserial, and fourfold 
Coefficients of correlation. In a later chapter methods will be presented 
for determining whether a sample coefficient is of sufficient size to 
demonstrate correlation in the population. This is the major role of 


the coefficients in research. 


RANK DIFFERENCE CORRELATION 
cal work that the variables whose 


It frequently happens in statisti : 
re available only in order of merit, 


relationships we wish to investigate a y ino 
importance, or some other quality. In other situations it may be de- 
Sirable to assign ranks to individuals who have been originally measured 
On two continuous variables, recording ranks instead of actual scores to 
indicate performance. If there is correlation between the two variables, 
the ranks will of course tend to correspond. 

The data of Table 7.5 are the order of finishing and the scores of 
11 students on a 150-item statistics test. The data for the first variable 
are already in rank order. The test scores are ranked from 1, smallest, 
to 11, largest. They might equally well be ranked from L. largest, tol fy 
Smallest. This would change the sign of the coefficient, not its size. 


162 / Statistics in Education and Psychology 


TABLE 7:5 


Order of Finishing A Statistics Test and Scores 
on the Test of 11 Students 


ORDER OF RANK OF DIFFERENCE 2 
FINISHING noone SCORE BETWEEN RANKS, D p 
1 91 4 —=3 9 
2 131 ki -5 25 
3 137 9 —6 36 
E 140 11 =7 49 
5 135 8 = 9 
6 130 6 0 0 
7 79 2 5 25 
8 67 1 7 49 
9 83 3 6 36 
10 138 10 0 0 
11 111 5 6 36 
SUM 0 274 


The differences between paired ranks and the squares of the differences 
are shown in the last columns of the table. 


To find the rank-difference coefficient of correlation, ra, we sub- 
stitute the sum of squares in the formula 


63D? 
= D oe me .16 
ig = 1 WN? (7.16) 
Making the substitution, we get 
21... 6274) _ 
ra = 1 iii — D Ú —.25 


and conclude that, in this sample, there is some negative correlation be- 
tween order of finishing and scores on the test. 

Had we ranked the scores from 1, largest, to 11, smallest, rg would 
have been +.25, but in view of the changed meaning of the ranks the 
interpretation would have been the same. 

Rank-difference correlation can be applied to bivariate data avail- 
able only in ranks, such as judges” ratings; to ranked continuous data, 
such as heights and weights or scores on two tests; or insituations where 
one variable is ordered and the other measured, as in the example 
above. The coefficient, ra, varies from — 1.00 to +1.00. The more ra 
departs from 0, the stronger the relationship. Two coefficients can be 
compared by squaring them, as in comparing r,,’s. Contrary to rzy, ra 


| 


Correlation and Regression / 163 


does not underestimate the relationship in curvilinear data, provided 
the curve is monotonic. 

In practical situations ties may occur in one or both sets of ranks, 
particularly when ranks are imposed on measures. In the case of ties 
it is logical to assign the mean of the ranks tied for to each of the ties, 
as illustrated below. The presence of ties in either or both sets of ranks 


score: 10 12 12 15 15 15 18 20 20 20 20 25 
RANK: 1 24 28 5 5 5 7 9 9% OF 9$ 12 


does not influence the computation of D?, but it does decrease |ral. 
Kendall (Ref. 26) gives a correction for r; between ranks showing ties, 
but in practice the correction is rarely worth making. Even should 
both sets of ranks have ties to the extent of the single set in the illustra- 
tion above, the correction would lower |ru| by only one or two one- 
hundredths. 

ra is a useful, easily computed coefficient. It is in fact a product- 
moment coefficient of correlation between ranked variables. Had we 
treated the ranks of Table 7.5 as raw scores and applied formula (7.3), 
we would have obtained the same result. However, 74 has few of the 
features of the product-moment coefficient obtained from normally 


distributed, continuous data. 


PRODUCT-MOMENT BISERIAL CORRELATION 


ons in which one of the two variables 


There are numerous situati 
observed only in two amounts 


Whose relationship is of concern can be 
or categories. For example, suppose we are interested in the relation- 
Ship between test intelligence and survival in school of college freshmen 
during a particular year. We can, of course, measure test intelligence in 
the usual manner, but survival in school is perhaps most defensibly 
Observed in the two categories, “dropouts” and “stay-ins.” Similarly, 
We might be interested in determining whether there is association be- 
tween "taking books home" and achievement in school, “reading comic 
books" and test intelligence, or “passing a particular item on a test” 
and performance on the whole test. 
There are many such problems in th 
there are generally better statistical met 
for dealing with them, it is sometimes desirabl 
relationship between the variables involved. 
. Let us see how we can obtain a measure of relatio 
dichotomous (two-division) and a continuous variable. 


e social sciences and, although 
hods than biserial correlation 
e to have a measure of 


nship between a 
The data in 


164 / Statistics in Education and Psychology 


TABLE 7.6 


Scores on a Reasoning Test of 12 Graduate 
Students, 4 of Whom Had Studied Logic 


STUDY OF REASONING 
LOGIC X TEST Y XY x? ye 
1 27 27 1 729 
0 25 0 0 625 
1 22 22 1 484 
0 20 0 0 400 
0 20 0 0 400 
0 18 0 0 324 
1 18 18 1 324 
0 18 0 0 324 
0 15 0 0 225 
1 12 12 1 144 
0 10 0 0 100 
0 10 0 0 100 
SUM 4 215 79 4 4,179 


Table 7.6 were obtained by giving a 30-item reasoning test to 12 gradu- 
ate students enrolled in a course in research methods. Four of the 
students had studied formal logic as undergraduates. No better 
measure of the study of formal logic was available than “had studied” 
and “had not studied." In Table 7.6 “had studied" is assigned a numeri- 
cal value of 1, and “had not studied," a value of 0. 

Let us now compute the usual product-moment rzy for the data in 
Table 7.6. When we substitute the sums in the last row of the table, 
remembering that N — 12, in formula (7.3) we have 


x (12 X 79) — (215 x 4) 
ry 3 
V[I2 X 4 — (42112 X 4,179 — Q15)7] 
so that rz, — .25. 
There is a simple formula available for the product-moment corre- 
lation coefficient in the biserial situation 


, i= Foe, 


Sy 


Tp (7.17) 
in which Y, is the mean Y score of the individuals in the upper, posi- 
tive, or “1” category, Yo is the mean Y score of the individuals in the 
other category; p is the proportion of individuals in the upper, positive, 
or "1" category; q the proportion of individuals in the other category. 


Correlation and Regression / 165 


so that p + q = 1; and s, is the standard deviation of the continuous 
measures Y. 
For the above problem, 


y PARTE 19.75, 
= REA 18 E PAPES qug, 


Yo 
X = 1, 8 _ 2, 
pss a 1293 
$ = t VIZ X47) — @15) = 522, 
sai tha Te (19.75 — too KIB ss, 


This product-moment coefficient of biserial correlation is cus- 
tomarily called point biserial and is indicated by rj, to distinguish it 
from another biserial coefficient, re, which is based upon the assump- 
tion that the dichotomous variable (in the above illustrative problem, 
the study of logic) is normally distributed. We shall not consider the 
Computation of r, here, since its shortcut estimate is ordinarily used in 
practical work. We discuss the estimate in the following chapter under 
the section “Test-Item Analysis." However, "pb, which involves no 
assumptions, tends to be more reliable and is the coefficient to use where 
there is doubt whether normality underlies the dichotomized variable. 


FOURFOLD CORRELATION 


The fact that the tallies in a correlation table bunch in 
and III, as in Figure 7.2, when there is positive correlation betwee 
variables (or in quadrants II and IV when there is negative correlati 
Indicates that a coefficient of correlation can be found for data classi 
In a 2 X 2, or fourfold table. 

Consider the data of Table 7.7. Suppose t 
Whether there is relationship between broken homes and deli 
We might proceed as in Table 7.3, and apply formula (7.4). 
this, we would obtain a coefficient of .18. 

However, there is a simple formula for the 


BC — AD : (7.18) 
r . 
» = V3 BXC + DA + OG + D) 


quadrants I 
n the 
ion) 
fied 
hat we want to find out 
nquency. 


If we did 


fourfold coefficient, rp, 


166 / Statistics in Education and Psychology 


TABLE 7.7 
Classification of 100 Boys by Home and Delinquency 


DELINQUENT NONDELINQUENT TOTAL 
UNBROKEN HOME 18 47 65 
BROKEN HOME 16 19 35 
TOTAL 34 66 100 


in which A, B, C, and D are defined as in Figure 7.8. That is, B is the 
frequency in quadrant I, A the frequency in quadrant II, C the fre- 
quency in quadrant III, and D the frequency in quadrant IV. Applying 
the formula to the data of Table 7.7, where A = 18, B = 47, C = 16, 
and D = 19, we find 


47 X 16 — 18 X I9 
Pp = .18 


V65 X 35 X 34 X 66 


and conclude that there is some relationship between broken homes and 
delinquency in our sample. In the same manner we could investigate 
other variables, such as sex, race, or health that might be related to 
delinquency. 

ry is as good a measure as any available when both variables are 
truly dichotomous. It is widely used in problems like the illustrative 
problem and in correlating the responses to pairs of questionnaire 
and test items. For example, if a number of pairs of cross-checking 
questions are included in a questionnaire, the consistency of re- 
sponse to any pair of questions may be examined by fourfold point 


— 
A B 
€ D 


Fig. 7.8. The fourfold correlation table for formula (7.18). A, B, C, and D 
are absolute frequencies. 


Correlation and Regression / 167 


correlation. Suppose that in a pair of cross-checking questions ; and 
j, the “yes” and “no” responses of 200 individuals are as follows: 


QUESTION i 
no yes 


yes 30 50 80 
QUESTION j — 
no | 70 50 120 


100 100 200 


If the responses were perfectly consistent, only the upper right and lower 
left cells would contain frequencies, and r, would be 1.00. As tabulated, 
the responses are correlated with r, — .20. So far as these questions, 
which supposedly ask the same thing, are concerned, the questionnaire 
has little dependability. 

We shall discuss a somewhat similar application of fourfold corre- 
lation under “Test Item Analysis," Chapter VIII. 

When a quick estimate is desired of the coefficient of correlation 
between normally distributed, continuous variables on which di- 
chotomies are forced, rp is not as satisfactory as the tetrachoric co- 
efficient. The tetrachoric coefficient assumes normality of both of the 
dichotomized variables and gives a good estimate if normality in fact 
exists. We shall not discuss the coefficient here. It is difficult to compute 
exactly; however, there are aids and shortcuts which ordinarily give 


satisfactory results. (See Refs. 8 and 44.) 


Relationships Among Three or More Variables 


We have been studying simple product-moment correlation, or the 
correlation between two variables. It is sometimes the case that we 
want to examine the relationships among three or more variables. For 
example, we may want to determine the relationship of high school 
grades and intelligence test scores, taken as a team, to grades in college. 

There are two kinds of problems in studying mutual relationships 
among three or more variables. The first relates to determining the 
correlation between two of the variables, when the influence of the 
other (or others) is eliminated. This is the net or partial correlation 
problem. The second is that of determining the joint relationship of 
two or more variables to a third. This is the multiple regression and 
correlation problem. Although the two kinds of problems can be shown 
to be related, we shall find it simpler to consider them separately. 


168 / Statistics in Education and Psychology 


PARTIAL CORRELATION 


It is well known that if each of two variables is correlated with a 
third, the relationship between the two variables is affected by the third. 
For example, if we were to measure the mental age and height in a 
group of 100 normal children, spread evenly over a chronological age 
range from 12 to 120 months, we should be likely to find mental age 
and height strongly correlated. This is true because both mental age and 
height in children are strongly correlated with chronological age. As 
another example, hours of study and semester averages in high school 
or college are sometimes found to be negatively correlated. If this 
correlation were accepted at its face meaning, students who are making 
unsatisfactory marks would be advised to study less. When scholastic 
aptitude is taken into account, however, the correlation between hours 
of study and semester averages is substantial and positive. In studying 
the relationship between two variables which presumably is due or 
partly due to the effect of a third variable, ideally only individuals alike 
in respect to the third variable would be selected for the study. In the 
language of experimental research, the effect of the third variable would 
be controlled. This would mean in the examples above that only chil- 
dren alike in chronological age and students alike in scholastic aptitude 
would be included in the correlation studies. 

Unfortunately, the experimental ideal is difficult to realize. For 
one thing, rigorous control on a variable may result in a very small 
sample. For another, it may be necessary or desirable to confine an 
investigation to an intact, heterogeneous group. It is frequently the 
case, particularly in preliminary investigation, that the best that can be 
done is to eliminate by statistical methods the effect of a third variable 
upon the correlation between two others. 

The product-moment coefficient of correlation between two vari- 
ables X, and X; with the influence of a third variable X eliminated by 
statistical methods is known as the partial correlation coefficient, and is 
written 712.3. Partial correlation may be thought of as a special applica- 
tion of regression theory. We have seen that the deviations from the 
line of regression of Y on X indicate the extent to which variables other 
than X influence Y. These deviations or residuals are values of Y 
independent of X. When we have three variables, X5, Xə, and Xs, the 
residuals or deviations from the line of regression of X; on Xs represent 
values of X, independent of Xs, while the residuals from the line of 
regression of X> on Xy represent values of X; independent of Xa. 


Correlation and Regression / 169 


Hence, the correlation between the X, and X; residuals theoretically 
will be independent of the linear effect of Xs. 

To find a partial correlation coefficient, r;5.3, we could find the 
residuals about the line of regression of X, on X» and the residuals 
about the line of regression of X; on Xs and correlate them. However, 
there is an easier way. We may find the coefficients of correlation be- 
tween X, and Xs, r12; between X, and Xs, r13; and between X» and 
Xa, r23; and substitute in the formula 


rig — risr23 " (7.19) 
2 2 I 
V1 — ria V1 — r33 


712.3 


To illustrate the use of formula (7.19), we shall suppose that the 
variables X, (problem-solving ability), X» (judgment), and Xs (IQ) 
yield the coefficients of correlation r12 = .50, r13 = .80, and r23 = .60. 
What is the coefficient of correlation of problem solving with judgment, 


with IQ controlled; i.e., what is the value of r12.3? Substituting in 
(7.19) we have 


VES .50 — (.80)(.60) = 05. 
12:83 M — (C809 VI — (60) 


We conclude that nearly all of the correlation between judgment and 
problem solving can be explained by the correlation of each with IQ; 
in fact, when we square .042, we find that less than 1 per cent of the 
variance of problem solving, independent of IQ, is accounted for by 
judgment. That is, less than 1 per cent of the variance of the residuals 
about the line of regression of problem solving on IQ is accounted for 
by judgment as measured. 

Formulas similar to (7.19) for the partial correlation coefficient 
between X, and Xs when X> is constant, and for X» and Xs when X, 


1$ constant, are 
Pig — 1223 


V1 — ris V1 — ris 


r23 — Fi2ri3 n (1.20') 
123. š 
LT AL = ms ATI — ma 


3 (1.20) 


r18.2 


Thus, to determine a partial coefficient of correlation, we need 
only to compute the simple coefficients between pairs of variables and 
substitute in the appropriate formula. 


170 / Statistics in Education and Psychology 


The partial coefficients defined above are known as first-order 
coefficients. The simple coefficients r,», etc. are known as zero-order 
coefficients. Hence, it may be said that first-order coefficients are com- 
puted from zero-order coefficients. 


USE AND INTERPRETATION OF PARTIAL CORRELATION 


The chief use of partial correlation is that of determining what the 
correlation between two variables would be if a third variable were not 
interfering with the relationship. The partial correlation coefficient 
712.5 may be thought of as a measure of the net correlation between X; 
and X», the influence of X; being eliminated. There are two important 
assumptions underlying the technique: (1) linearity of regression of the 
two variables upon the third variable, and (2) equal scattering of the 
values of the two variables for different values of the third variable. 
The second assumption is analogous to the assumption of homosce- 
dasticity in the two-variable problem. 

Neither of the assumptions will ordinarily be fully satisfied in 
practical applications; hence, a partial coefficient of correlation should 
be regarded as a sort of average value. The coefficient may obscure 
significant relationships between X1 and X; for certain values of Xs. 
Specifically, if we had a large group of individuals measured on arith- 
metic problem solving ability, reading ability, and mental age, we might 
find the correlation between problem solving ability and reading ability 
at one level of mental age to be quite different from that at other levels. 
This is a serious limitation of partial correlation as a research tool, and 
rarely if ever can it be considered an acceptable substitute for experi- 
mental control. 

In spite of its limitations, however, partial correlation is useful, 
particularly in preliminary investigations of relationships. 
situation in which it is logical to think of experimentally contr 
variable which may or may not be interfering with the rela 
between two others, partial correlation may be used. 

Partial correlation technique can be extended to eliminating the 
linear effects of more than one variable. (See Ref. 39, pp. 433-436.) 


In any 
olling a 
tionship 


THE REGRESSION EQUATION IN THREE VARIABLES ... 


The second kind of correlation problem involving three or more 
variables is that of determining the regression equation of the linear 
relationship between one of the variables and the other variables con- 
Sidered as a team and of measuring the strength of the relationship. 


Correlation and Regression / 171 


Consider the three-variable case in which it is desired to determine 
the equation of the linear relationship between X;, a dependent vari- 
able, and X> and Xs, independent variables. In practical work, X, 
might be problem-solving ability, X» reading ability, and X3 mental age. 
Or, X, might be semester averages of a group of college freshmen, X 
their high school averages, and X; their scores in a scholastic aptitude 
test. As another example, X; might be measures of speed of typing, X> 
measures of finger dexterity, and Xs measures of reaction time. The 
practical applications are numerous. 

If we plot X;, X», and Xs in a tridimensional scatter diagram, the 
dots will tend to form an ellipsoid, if relationship exists. The equation 
that best summarizes (in the least-square sense) the relationship of X» 
and X; to X, will be the equation of the plane which cuts the ellipsoid 
in such a way that the sum of squares of the deviations of X, from the 
plane is a minimum. This plane is analogous to the line which best fits 
the dots in a two-dimensional scatter diagram. It is called the plane of 
regression of X; on X», Xs. 

The equation of this best-fit plane is, in raw score form, 


Xi = biog Ž (Xa — Xa) + bise (Xs = Nay, (721) 


Where b; 5.3 and 5,5.» are defined by 


bins = 2 — 19028, (7.22) 
1 — rs 

IS ris — meres, (7.22) 
l: = fs 


The coefficients, 545.3 and b13.2 are partial regression coefficients. 
The symbol 5, 2.3, for example, refers to the net regression of X; on X> 
With X. held constant. These coefficients are frequently called beta 
coefficients. 

The use of equation (7.21) to predict X; from X> and Xg is similar 
to the use of equation (7.7) in predicting Y from X. To illustrate, we 
Shall predict semester averages, X, from Regents’ averages, Xə, and 
SAT scores, Xs, using the data for the 76 freshmen of Table A, Ap- 
pendix, who hate Regents’ averages. I 

The first step is to find the coefficient of correlation between Xy 
and Xs, X, and Xa, and X; and Xs, and the means and standard 
deviations of the variables. Four of the 76 freshmen who have Regents' 
averages do not have SAT scores. We may either exclude the four or we 


172 / Statistics in Education and Psychology 


may record the average SAT score of the 72 for each of the four. 
Either procedure may be justified; we shall apply the latter. It will be 
seen that recording the average SAT score for the four has little effect 
on the statistics. When the SAT scores are grouped in the correlation 
table by intervals of 50 and the semester and Regents’ averages are 
grouped as in Table 7.3, the statistics obtained from the three correla- 
tion tables, some of which we have previously found (p. 138), are 


SEMESTER. REGENTS’ SAT 


AVERAGES, Xy AVERAGES, X> SCORES, X3 
X = 75.19 X. = 87.16 Xs = 1084.37 
sı = 7.24 $9 = 4.97 = 121.38 

.503 


rig = .679 ris = 493 123 


The next step is to compute the partial regression coefficients. Sub- 
stituting the correlation coefficients in formulas (7.22) we get 


bro. — 2879 — (.493)(.503) 
12.3 = 


I — (503)2 mid 
— 493 — (679)0503) _ 
bis. I — (503: 203. 


The third step is to substitute the regression coefficients, means, and 
standard deviations in equation (7.21) and simplify. Thus, 


724 724 
ss — (X; — 87. E LEA 
Xi = 577 ¿55 (Xe — 87.16) + 205 pi3g As 


.841 X; + .0121 X3 — 11.23. 


— 1084.37) + 75.19 


II 


This is the equation of the plane of regression of semester averages 


on Regents’ averages and SAT Scores, considered jointly, for the given 
data. It summarizes the linear relationship between the dependent 
variable and the two independent variables. It is the equation to use 
if we wish to predict a semester average, X, for an applicant, given his 
Regents’ average, X», and his SAT score, Xs. For example, given 
A» = 85.0 and Xs = 1100, we get X7 = .841(85.0) + .0121(1100) — 
11.23 — 73.6. Thus, 73.6 is the semester average we predict for an 
applicant who has a Regents’ average of 85.0 and a SAT score of 1100. 
Predictions of X, for other values of X» and X; are similarly made. 

It is possible to determine the regression equation which best 
summarizes the linear relationship between a dependent variable and 


three or more independent variables. (See Ref. 35, Chap. 8, for the 
general case.) 


Correlation and Regression / 173 


COEFFICIENT OF MULTIPLE CORRELATION 

If we predicted X, for each of the 76 freshmen of the above 
example, we could pair the predicted and actual semester averages and 
correlate them. If we did this, we would obtain the coefficient of correla- 
tion, which is known as the multiple coefficient of correlation of X, 
with X> and X3 and is usually denoted by the symbol Rj. 93. 

In general, the coefficient of multiple correlation, R41.23, is defined 
as the product-moment coefficient of correlation between the observed 
values of a variable X, and the theoretical values given by the equation 
of linear regression of X, on X», Xs. The coefficient could always be 
found by correlating the theoretical scores with the observed scores. 
This, however, is not necessary. It can be shown that Ry.23 is given by 
dcc Rios = Vri2b12.3 + r13bis.2- (7.23) 

Let us use (7.23) to find the multiple coefficient of correlation of 
semester averages with Regents’ averages and SAT scores in the illustra- 
tive example above. The needed correlation coefficients are .679 and 
493 and the corresponding regression coefficients are .577 and .203, 
respectively. Substituting, we have 


Rios = V.679(577) + .493(203) = .701. 


This is the coefficient we would get if we correlated the 76 predicted or 
theoretical semester averages with the actual semester averages. 

The multiple coefficient may be interpreted in the same way as the 
Simple coefficient, r,y. It is the simple correlation between observed 
and theoretical values of a dependent variable. Since the theoretical 
values are obtained from the regression equation involving the inde- 
pendent variables X> and Xs, the multiple coefficient indicates the extent 
to which variation of X, is associated with the joint variation of X» 
and X,. In our example, (.701)? or 49 per cent of the variance of 
Semester averages is accounted for by variation of Regents' averages 
and SAT scores, and 51 per cent of the variance by variables other than 
these two. 

The multiple coefficient of correlation has an important use in 
measuring the scatter of the scores about the plane of regression and in 
Judging the accuracy of prediction. This is our next topic. 


THE STANDARD ERROR OF ESTIMATE IN MULTIPLE REGRESSION 


In an earlier section, we found that the standard error of estimate, 
Sy.z, in the two-variable problem was merely the standard deviation of 


174 / Statistics in Education and Psychology 


the differences between the observed scores Y and the theoretical scores 
Y’, or the standard deviation of the residuals about the line of regres- 
sion of Y on X. 

The standard error of estimate in the three-variable problem is the 
standard deviation of the X, residuals about the plane of regression of 
X; on X», Xs, and is denoted by 51.23. In the illustrative problem, if 
we were to predict semester averages for the 76 freshmen, find the differ- 
ences between predicted and actual averages, and compute the standard 
deviation of the differences, we would have s,,53. We could always 


find s1.23 this way, but it is not necessary. Analogously to formula 
(7.9) we may write 


$123 = $1Vl — Ris. (7.24) 


Thus, to find the standard error of estimate, we need only the standard 
deviation of the dependent variable and the coefficient of multiple cor- 
relation. In the illustrative problem, for example, where s, is 7.24 
and R.53 is .701, 81.93 is 7.24,/1 — (701)? or 5.16. 

Assuming that the residuals are normally distributed and equally 
scattered, s, 23 may be used in exactly the same way as 5... to construct 
approximate confidence bands for scores yet to be earned or observed 


and to calculate the chances that a score will be below (or above) some 
specified value (pp. 148-151). 


ANALYSIS OF MULTIPLE RELATIONSHIPS 


In the main, A .53 is interpreted in the same way as Try. It enters 
into the standard error of estimate in the same way as does r,, and 
thus, like r,,, indicates the efficiency of prediction. In this connection, 
the values of Table 7.4 may be used in interpreting R4,23. 

We have noted that RẸ 4 indicates the proportion of variance of the 
dependent variable which is explained by or accounted for by the vari- 


ation of the two independent variables. In the three-variable problem 
it may be shown that 


Rios = Diaz + Dina + 2r23b12.3b13.2. (7.25) 


This expression indicates that the direct contribution of X» to the ex- 
plained variance of X, is equal to 52» 3; that of Xa is equal to 525.5; 
and that the indirect contribution, resulting from the intercorrelations 
between the variables, is equal to 2r23b12.3b13.2. Let us go back to the 
illustrative problem to apply this idea. There, R12.3 is 701, bys.a is 
577, by3.9 is .203, and r23 is .503. Substituting these values in (7.25) 


Correlation and Regression / 175 


we have 
(701)? = (577)? + (203)? + 2(.503)(.577)(.203) 


491 = .333 + .041 + .118. 


This tells us that of the percentage of variance of semester averages, 
49.1, which is explained or accounted for by the variation of Regents’ 
averages and SAT scores, 33.3 per cent is explained by Regents” 
averages, 4.1 per cent by SAT scores, and 11.8 per cent by the inter- 
correlation between the variables. Parenthetically, let us note that it is 
typically the case that achievement data, such as Regents' averages and 
high school grades, make from five to ten times more direct contribu- 
tion than do aptitude test data to the variance of college freshmen 


grade averages. 


CONCLUDING REMARKS 


, The principal assumption underlying multiple correlation analysis 
is that the relationship between each pair of variables is linear, i.e., that 
the data in the correlation tables, from which the zero-order coefficients 
are computed, follow a linear pattern. The standard error of estimate 
in multiple regression assumes that the residuals about the regression 
plane are normally distributed and homoscedastic. It will be seen that 
the assumptions are similar to those in simple correlation and regression. 
. Multiple correlation and regression methods may be used in any 
Situation where we have a sample of individuals measured on three or 
more variables, provided the assumptions are tenable. The methods 
have important applications in advanced statistics. They are related 
to factor analysis, the discriminant function, and the analysis of co- 
variance with control on two or more variables. 

We have considered only the three-variable case in connection with 
Prediction and the analysis of relationships. In practical work, there is 


usually little to be gained by adding variables. Actually, it is frequently 


the case that prediction and analysis are nearly as efficient where only 
here are several independent 


One independent variable is used. When t 
Variables to choose from, the best single predictor is, of course, the one 
Which shows the highest correlation with the criterion; the best two 
predictors are the two which show relatively good correlation with the 
criterion and relatively low correlation with each other. 

We have only touched on the many problems in multiple correla- 
tion and regression. For more complete accounts, the student is re- 
ferred to References 23 and 35. The former is particularly useful after 


176 / Statistics in Education and Psychology 


one has knowledge of sampling theory. It presents thorough and pre- 
cise methods of dealing with the multivariate data on any number of 
independent variables. 


10. 


Up 


12. 


+ Perfect positive correlation is present when the mem 


EXERCISES 


. Can you think of any facts or variables unrelated to others. Is informa- 


tion about these worth anything at present? 


. Can you think of any relationships between variables which are not useful? 
. Suppose that an investigator wished to determine whether conservative 


attitudes were related to chronological age. How would he proceed ? 


. From your own observation what are some variables related to intelli- 


gence? Scholastic achievement? Juvenile delinquency? Low income? 
Continuation in school? 


. Consider the statement, “So many accepted relationships have been 


proved false that the study of relationships is futile in social science." 


. What would you expect to be the nature of the correlation in each of the 


following? State other variables which may affect the relationship in 

each. 

. Vocabulary and mental age. 

. Age and liberalism. 
Speed of response and scores on a timed intelligence test. 

. Speed of response and scores on an untimed intelligence test. 
Aptitude and hours of study in a given subject. 

. Hours of study and number of statistics exercises worked. 


mo es => 


- Suppose that you have 10 pairs of scores. Indicate by graphs (similar to 


7.1 or 7.2) the pattern the scores would make if they were characterized 
by (a) perfect positive correlation, (b) imperfect positive correlation, 
(c) zero correlation, (d) imperfect negative correlation, 


(e) perfect nega- 
tive correlation. 


bers of each pair of 


standard scores are equal. Show that for perfect positive correlation, 


rz = 1. 
Perfect negative correlation is present when the members of each pair of 
standard scores have the same absolute value, but differ in sign. Show 
that for perfect negative correlation Fu = —1. 

Give a numerical example or construct a general proof to show that 
changing all of the X scores by a constant or all of t 
constant does not affect the value of Done 

The scores of 17 pairs of husbands and wives on a 
ing child care are given below. Find and interpret 


he Y scores by a 


questionnaire regard- 
Pay» 


HUSBANDS: 30 38 31 27 28 23 24 20 25 34 12 29 39 32 42 44 32 
WIVES: 39 36 29 29 41 32 20 32 22 29 26 43 43 29 47 48 27 


13. 
14. 


15. 
16. 
17. 
18. 


19. 


20. 


21. 


22. 


Correlation and Regression / 177 


Show that formulas (7.1) and (7.2) are equivalent. 

The head of a college foreign language department once said that high 
school language grades were the best single predictor of college grades. 
What evidence bearing on this assertion can you obtain from the data 
of Table A, Appendix? What are some limitations of the evidence? 
Suggest other correlation studies of the data of Table A, Appendix. State 
the limitations of each. 

Why are the totals of columns 3, 5, and 6 equal respectively to the totals 
of rows e, c, and f in Table 7.3? 

Would a change in either or both grouping schemes in the correlation 
table affect the correlation coefficient? Explain. 

Find the coefficient of correlation between Regents' Language and College 
Language, Table A, Appendix. Interpret. 

The figures below are based on data reported by the World Almanac, 
1950, 1964. Would r., fairly measure the relationship between public 
school enrollment and per capita expenditure? Explain. 


YEAR PUBLIC SCHOOL ENROLLMENT PER CAPITA EXPENDITURE 


(MILLIONS) (DOLLARS) 
1880 9.87 7.91 
1890 12.72 11.04 
1900 15.50 13.87 
1910 17.81 23.93 
1920 21.58 48.02 
1930 25.68 90.22 
1940 25.43 92.16 
1950 25.11 232.47 
1960 36.09 432.66 


If data are negatively correlated, show that the sign of the coefficient of 
regression of Y on X is negative. What is the direction of the regression 
line for negatively correlated data? 
Show that the equation of regression O! 
Zy = Fay2z. 

Referring to the illustrative example in the text, pp. 147-151, 

a. What semester average would be predicted for an applicant having a 
Regents' average of 90.0? 

b. If 100 applicants having Regents' averages of 90.0 were admitted, how 
many would be expected to make semester averages below 75? 

c. What are the chances that an applicant having a Regents' average of 
90.0 would, if admitted to college, make a semester average of 75 or 
above? 

d. What is the 68 per cent confidence band for the future semester average 
of an applicant having a Regents' average of 90? 

e. What are the conditions and assumptions underlying the above pro- 
cedures? 


f Y on X, in standard score form, is 


178 / Statistics in Education and Psychology 


23. 


24. 


25. 


26. 
27. 


28. 


In a school of dentistry it was found that scores on a chalk-carving test 
were correlated with grades in a basic technics course. The statistics were 


CARVING TEST X TECHNICS GRADES Y 
X = 300 Y = 80.0 
sz = 50 sy = 10.0 
ry = .60 


a. Under what conditions can the technics grades for applicants logically 
be predicted from the carving test scores? 
. What grade would be predicted for an applicant who scored 25 on the 
carving test? 
c. What are the assumptions under which the accuracy of the grade 
predicted in (b) can be estimated? 
d. What is the standard error of estimate? 


. What are the chances (probability) that the grade predicted in (b) 
would not in fact be 60 or lower? 


In a cross-validation study, Huck (Ref. 21) examined, among other 
things, the differences between earned grade-point averages and the 
averages which had been predicted at the time of admission in a sample 
of 130 dental school students. The standard error of estimate was 5.40. 
The distribution of differences is shown below. What percentage of the 
differences are within +5.40 of their predicted value? Within +10.80 of 
their predicted value? What is the standard deviation of the distribution ? 


DIFFERENCE f 
llto 13 3 
8to 10 8 
5 to 7 11 
2to 4 21 

— 1 to 1 26 

—410—2 25 

= 7 ta — 5 17 

—10 to — 8 11 

—13 to —11 4 

—16 to —14 3 

—19 to —17 1 


Describe several ways of interpreting rzy 
which each way is appropriate. 

Describe a situation in which an Fzy Of about zero woul 
Plot the values of rz, and V1 — 72, 
the resulting curve. 

In Exercise 23, above, how much of the variance of technics grades is 
accounted for or explained by the correlation with 


the chalk-carving 
Scores? How much is unexplained? How large would r., have to be to 
explain half of the variance? 


and describe a situation in 


d be important. 
of Table 7.4 in a graph. Interpret 


29. 


30. 


31, 


32, 


33; 


34, 
35, 


36. 


37. 


38. 


Correlation and Regression / 179 


Given the scores below, compute rz, and write the equation of regression 
of Y on X. Predict Y' for each X. Find the variances of the Y” scores and 
the residuals, Y — Y'. Numerically check equations (7.11) and (7.12) 
and the equation sí = sí + Si. 


X 105 102 99 96 93 
Y 78 80 74 72 76 


In a relatively heterogeneous sample, the coefficient of correlation be- 
tween high school and freshmen college grades was found to be .72. If 
the standard deviation of high school grades for the group was 9.0, what 
coefficient would be expected (other things being equal) in a college where 
selection on the basis of high school grades resulted in a freshman class in 
which the standard deviation of high school grades was 4.0? 

In Exercise 12, above, the mean of the husbands' scores is 30.0 and the 
standard deviation is 7.9. The mean of the wives' scores is 33.6 and the 
standard deviation is 8.3. The five highest husbands' scores are 44, 42, 39, 
38, and 34, with a mean of 39.4. These are paired with wives' scores of 
48, 47, 43, 36, and 29, with a mean of 40.6. Convert 39.4 and 40.6 to 
standard scores in their respective series and interpret the results. Now 
select the five highest wives’ scores, find the mean of these and the mean 
of the paired husbands' scores, convert the means to standard scores, 
and interpret the results. 

A school counselor noticed that students of low IQ's as a group made 
better school marks in proportion to their ability than students of high 
IQ's. The counselor concluded that the students of low IQ's were trying 
harder. Criticize the conclusion. 

An instructor found that a new method of teaching English resulted in 
students who had been low on the pretest, but 
for students who had been high. He concluded 
dents than for the 


relatively large gains for 
in relatively small gains 
that the method was more suitable for the poorer stu! 


better, Comment. mt . 
Find the rank difference coefficient of correlation in Exercise 12, above. 


Find the rank difference coefficient of correlation in Exericse 19, above. 
Why is ra a better measure of relationship than rz, in this case? 

Find the point biserial coefficients of correlation of items 6, 11, and 13 
with the total mathematics test scores in Table 8.2. 

In a study of teaching efficiency, it was found that of 52 successful teachers, 
38 had held one or more elective offices as students in high school or 
college. In a group of 37 unsuccessful teachers, only 11 had held such 
offices. Classify the data in a 2 X 2 table and find rp. Interpret. 

In an investigation of the relationship between semester averages Xj, 


hours of study per week X», and scholastic aptitude Xs, the zero-order 


coefficients were found to be ri2 = —-05, ris = .40, and r23 = —.50. 


180 / Statistics in Education and Psychology 


39. 


40. 


Find the coefficient of correlation between semester averages and hours 
of study, with scholastic aptitude constant. Interpret this net correlation 
coefficient. 

Look at the data of Table A, Appendix. Describe step by step how you 
would (a) find the equation of regression of semester average on VAT 
and MAT, (b) determine the joint contribution of VAT and MAT to the 
variance of semester averages, (c) compare the relative importance of 
VAT and MAT in predicting semester averages, and (d) determine the 
standard error of estimate. 

The three-variable regression equation is sometimes written in terms of 
partial correlation coefficients, standard errors of estimate, and deviation 
scores as seen below. Reconcile this equation with equation (7.21): 


$1.3 $1.2 
$2.3 


Xi = riza 


CHAPTER VIII 


Reliability and. Validity 
of Statistical Evidence 


It is the aim of statistical procedures to obtain trustworthy evidence 
Which can be used in solving problems. This aim has been emphasized 
in the preceding chapters, but as yet little has been said about the 
conditions of trustworthy evidence. 

The ultimate test of evidence, of course, is to determine whether 
the generalizations it supports are useful in prediction; i.e., whether 
they enable us to say with some degree of confidence, “If this is done, 
that will happen." But if this test were the only one available, a gen- 
eralization would be little better than a “try-it-and-see” suggestion. 

Observational evidence is subject to error, and generalizations 
drawn therefrom consequently are in doubt to some extent. It usually 
is possible, however, to prevent certain errors and to make allowances 
for those that cannot be prevented. When this can be done, the un- 
certainty of the generalizations is reduced. 

.  Ttis the peculiar advantage of statistics as a tool in research that 
it provides rational methods of estimating the extent to which observa- 
tional evidence may be in error. We shall consider these methods in 


some detail in the present and following chapters. 
181 


182 / Statistics in Education and Psychology 


The Conditions of Trustworthy Evidence 


The most important condition of trustworthy evidence obviously 
is that it be relatively free from error. Statistical data are subject to 
three important kinds of errors: sampling errors, errors of measure- 
ment, and constant errors or bias. 

Sampling errors are errors that are due to chance or sampling 
flüctuations. Sample evidence, as has been emphasized at various 
points in preceding pages, is always suspect to some extent because of 
inevitable chance fluctuations. The mean of a sample from a specified 
population, for example, will seldom agree with the mean of a second 
sample from the same population, nor will it ordinarily be equal to the 
mean of the population. i 

Errors of observation or measurement are defined as random 
errors made in the application of an instrument, such as a meter stick, 
spring balance, psychological test, or questionnaire in attempting to 
determine some “dimension” of an object. They include variable 
mistakes in reading the instrument, errors resulting from variable 
conditions that affect the instrument or the object; in short, all of the 
inaccuracies due to influences present during the measuring process that 
affect the results in random manner. These errors are primary, in that 
they are inherent in all original observations. It is important to notice 
that errors of measurement are, or are assumed to be, random. 

Constant errors or errors of bias result when there are present in 
the measuring process, or in the selection of a sample, nonrandom or 
systematic influences which prejudice the observations, 
affect the original observations, as would be the case if a 
instrument were used, or it may affect sample evidence, if it 


Bias may 
distorted 


1 enters in 
the selection of a sample. When the amount of bias is known or can 
be estimated, as is Sometimes the case, it is not difficult to adjust or 


correct observations accordingly. However, when errors 
due to carelessness in sampling or observing, 
or failure to consider all of the evidence per! 
as is frequently the case, 


of bias arise 
unwarranted assumptions, 
tinent to a given question, 
: i satisfactory corrections are usually impos- 
sible. Wilson (Ref. 53) d 


ible. escribes the subtle and pervasive nature of 
bias in research and suggests ways of combating it. 
sible bias, one should always be critical of observati 


and thoughtfully skeptical about evidence after it is c 

Although there is a very real connection betw 
servation and sampling errors, it is convenient to 
separately. - In this chapter we shall consider err: 


Because of pos- 
onal procedures 
ollected. 

een errors of ob- 
deal with the two 
ors of observation 


Reliability and Validity of Statistical Evidence / 183 


and related topics; sampling errors, as such, will be taken up in Chap- 
ter IX, “Statistical Inference.” 

There are two fundamental questions regarding the quality of 
original observations, or any evidence for that matter: (1) whether 
the evidence observed faithfully represents the situation it is supposed 
to represent or really means what it is considered to mean and (2) 
whether a second, independent observation will yield evidence con- 
sistent with the first. It is conventional to subsume considerations of 
the first question under the term validity, and those relating to the 


Second under reliability. 


THE MEANING OF VALIDITY 


It is commonly said that the first condition of trustworthy evi- 
dence is that of validity. This condition is interpreted to mean that the 
evidence must be relevant to the issue it is supposed to throw light 
upon or that it must accomplish the purpose for which it is collected. 
If a set of historical facts truly pictures some particular circumstance 
of the past, the facts are said to be valid; if an achievement test of, 
Say, algebra really measures achievement in algebra, it is a valid test; 
if an intelligence test, or some other aptitude test, is used to predict 
Success in academic work or in a vocation, it is said to be valid if it 
really predicts success. Methods of collecting valid evidence are said 
to be valid. Thus, we may talk about either valid observations or valid 
instruments with no loss of meaning. e . 

In current psychological testing, four types of validity are iden- 
tified: content, construct, concurrent, and predictive. The student 
Will find comprehensive treatment of the four types in Reference 1. 
For our purposes it is desirable to distinguish between only two types. 
The first we shall call logical, or formal. To have formal validity, 
in nature and in method of collection, with 


poet deed agres ior information. For 
prior i y 


Specifications that are set up on the basis of : 
example, if a test is constructed in accordance with definitions, with a 
body of content, or with opinions of authorities, it may be considered 
to be formally valid. If intelligence is defined as the ability to do 
Certain paper and pencil tasks, such as arithmetic reasoning, sentence 
Completion, and word definition, a test containing these tasks is a 
formally valid test of the intelligence of the individuals for whom it is 
designed. If an achievement test is in agreement with the content 
Which was taught, it is formally valid in the given situation. Tf, in the 
Opinion of mental hygienists, a checklist or questionnaire relating to 


184 / Statistics in Education and Psychology 


individual adjustment adequately covers the aspects of adjustment in 
question, it is considered to be formally valid. 

The second kind of validity we shall call experimental. Evidence 
is experimentally valid if it accomplishes the purpose for which it is 
gathered. For example, if interview, questionnaire, or aptitude test 
data are gathered as evidence of individual qualifications for school 
or a job, the data are experimentally valid to the degree that they 
predict success or lack of success in school or on the job. Evidence 
gathered as a basis for revising a school curriculum in accordance 
with pupil needs is experimentally valid if it can be shown that the 
revised curriculum actually functions to meet pupil needs. The con- 
dition of experimental validity of evidence obviously is coextensive 
with the pragmatic test of the results of research, mentioned earlier. 

It will be noted that both kinds of validity are stated in terms of 
something outside of the evidence itself. Formal validity depends 
upon whether the evidence or the method of collecting the evidence 
agrees with specifications which have been set up in advance. Ex- 
perimental validity depends upon whether the evidence fulfills its 
purpose. Thus, validity explicitly demands a criterion or criteria out- 
side of the evidence. It follows that questions pertaining to validity 
are always specific. Evidence can never be said to be valid in a general 
sense. It is valid because it agrees with a particular set of specifications 
or because it accomplishes a particular purpose. Statements regarding 
the validity of evidence have meaning only when the criteria and the 
validation procedures are completely described. It must be kept in 
mind that criteria and validation procedures are themselves usually 
open to question. There are various ways of arriving at criteria, 
whether they be measures of academic or vocational success or formal 
specifications to be met. Hence, detailed description regarding how 


and why particular criteria are used is an essential step in reporting 
research. 


There is a great deal of ambi 


guity and circularity in the concept 
of formal validity, 


perhaps even an element of medieval scholasticism. 
The definitions, specifications, or other criteria set up by one group of 
teachers, authorities, or experts will rarely agree with those set up by a 
second. Since the time of Galileo, the limitations of formal validity 
in research have been rather generally recognized. This fact is not as 
damaging, however, as it might first seem. If we keep in mind that the 
ultimate test of evidence is the pragmatic test—i.e., the demonstration 
that the evidence is useful in prediction and explanation—the principle 


Reliability and Validity of Statistical Evidence / 185 


of formal validity is a helpful one. Insofar as the principle enables 
us to eliminate guesswork and to exclude irrelevant evidence or un- 
promising attempts to gather evidence, to that extent it is of value. 
An aptitude test or questionnaire drawn up according to considered 
specifications presumably will work better than one which is not. The 
important thing is not to confuse formal validity with experimental 
validity and not be content with formal validity. 

While experimental validity is neither ambiguous nor circular, it 
does demand criteria which may be difficult to provide. The prob- 
lem of measuring success in school or on the job is never easy, nor is it 
easy to evaluate the results of a program in action. The selection of 
criteria is generally difficult. 

When evidence can be statistically correlated with criteria, the 
coefficient of correlation is customarily designated the validity co- 
efficient. Any of the correlational methods discussed in the preceding 
Chapter may be used in measuring the relationship between evidence 
and criteria, provided, of course, the assumptions underlying a par- 
ticular method are satisfied. 

The concept of validity is fundamental and never to be forgotten, 
but in its present ramifications in psychological research it is an elusive 
Concept and one whose demands permit various interpretations. In a 
last analysis, questions regarding the validity of evidence have to be 
resolved on a “try-it-and-see” basis. The writer has suggested else- 
where (Ref. 41) that the concept of utility be substituted for that of 
validity. It would seem straightforward and intelligible to consider 
evidence as useful or not useful for a specified purpose. 


THE MEANING OF RELIABILITY 

Research, as a method of solving problems, is unique in that its 
results are publicly verifiable or verifiable on demand. This means 
simply that the results can be checked and verified by any competent 
Observer, The researcher is influential, not because he observes some- 
thing no one else can see, but rather because he observes something 
others can see when it is brought to their attention. It may be said that 
a fundamental condition of evidence is that of dependability, or re- 
liability. 

The evidence of research is said to 
by impartial, independent observers, 
Obtained by independent repetitions o 
first obtained. For example, historical evidenc 


be reliable if it can be verified 
or if it agrees with evidence 
f the process by which it was 
e or legal testimony is 


186 / Statistics in Education and Psychology 


considered to be reliable if there is agreement between independent, 
well-informed historians or witnesses. Psychological test, interview, 
or questionnaire data are considered reliable if a second application 
of the procedure by which they were originally collected yields data 
that are in agreement or consistent with the original data. A sample 
statistic, such as a mean or correlation coefficient, is reliable to the 
extent that it agrees with values yielded by further random samples 
from the same population. 

The idea of agreement or consistency between later and earlier 
independent observations, although fundamental in the general 
understanding of reliability, is unclear because "agreement" and 
"consistency" are not defined. Several questions arise immediately. 
Does reliability imply perfect agreement? If not, how much dis- 
agreement can be tolerated? Since the best evidence of the present 
may be contravened, even demolished. by the evidence of the future, 
can any evidence be said to be reliable? 

First, let us note that since all Observation is characterized by 
error to some extent, observational evidence can never be said to 
be perfectly reliable. Second, reliability as it is used in research does 
not imply "unchanging" or “changeless.” The condition of agree- 
ment between earlier and later observations loses meaning if time or 
some other factor extrinsic to the measuring instrument or process 
changes (or has Opportunity to change) the objects under observation. 
As was previously noted, measurement is an attempt to determine a 
"dimension" of an object. If that dimension is changing during the 
period of observation, a requirement of reliability obviously would be 
disagreement between earlier and later observations. 

The concept of reliability, as it is understood in research, means 
only approximate agreement between independent observations of an 
object or event, the only causes of disagreement admitted being errors 


of observation, discussed above. Other possible causes of disagreement 
such as changing dimensions 


one or all observations, are of theo 


"approximate agree- 


i a a irom vation,” as these are 
used in connection with reliability. Our discussion of these phrases 


Reliability and Validity of Statistical Evidence / 187 


will be brought into better focus if we confine it to psychological tests 
and test scores, although what we have to say has wider application. 


THE COEFFICIENT OF RELIABILITY 


The conventional methods of estimating the reliability of a 
psychological testing process are based upon correlating scores ob- 
tained by (1) applying the same test twice to a given group, (2) ad- 
ministering two parallel forms of a test to a group, and (3) dividing a 
single test into equivalent halves. The correlation coefficient thus 
Obtained indicates the extent of agreement between the two sets of 
Observed scores, or the self-correlation of the test. A product-moment 
coefficient of correlation computed from the scores obtained by 
procedures (1) or (2) is called a reliability coefficient and is denoted by 
ri or just r,. The coefficient computed from (3) is known as the 
half-test or split-half reliability coefficient and is denoted by rı 1 or 


I 
just r}. TI 


nie 


3 


Since the longer a test is, other things being equal, the more 
Teliable it is, the reliability coefficient computed from equivalent 
halves underestimates the reliability of the whole test. There is a simple 
formula available for estimating the reliability of the whole test from 
the half-test coefficient, the Spearman-Brown prophecy or step-up 


formula, ES 
1 


(8.1) 


By use of the formula we would obtain a reliability coefficient rı of 


:79 from a half-test coefficient n of .65, since we would have rı = 


2(.65)/(1 + .65). 2 k 

It is possible to estimate the reliability of a test n times as long 
as the test or part-test for which the reliability coefficient has been 
determined, provided the n parts are equivalent or truly comparable. 


The formula is 


A. iom 8.2 
m = TF (a — Dri G) 


in which ra is the estimated reliability coefficient of a test n times as 
long as the test or part-test whose reliability coefficient is rı. It will 
be noted that when n = 2, formula (8.2) reduces to (8.1). Formula 
(8.2) can be used to “step-down” as well as “step-up” a reliability 


188 / Statistics in Education and Psychology 


coefficient. If it is desired to estimate the reliability of a test 1/n as 
long as the original test whose reliability is known, (8.2) may be solved 
for rı. The formula may also be solved for n, in case it is desired to 
estimate how much a test of known reliability needs to be lengthened 
to have a specified reliability. (See Ex. 7 and 8.) 

Before leaving the three common methods of estimating the re- 
liability of a test, we should note that, strictly speaking, only the test- 
retest method can be said to measure extent of agreement between 
repeated observations. Whether parallel forms or equivalent halves 
of a test measure the same thing is always debatable. The test-retest 
method, however, is at a disadvantage in measuring abilities which 
are sensitive to memory and learning effects. We shall have more to 
say about these methods later. At this point let us emphasize the 
fact that the estimates of reliability obtained by the different methods 
do not mean the same thing. The American Psychological Association 
(Ref. 2) has recommended that a coefficient obtained by the test-retest 
method be designated a coefficient of stability; one obtained from the 
parallel-forms method a coefficient of equivalence; and one obtained 
from the equivalent-halves method a coefficient of internal consistency. 
Whether or not the terms are adopted, the idea that the different 
methods of estimating reliability result in coefficients having somewhat 
different meaning is important. (See paragraphs c, d, p. 203.) 

Correlation methods of determining the extent of agreement be- 
tween independent observations have wide application. In any situa- 


es of observations of presumably the 
of the individuals in a group, the reliability 
titative statement of the extent of agreement 
between observations. The magnitude of the coefficient gives meaning 
to the phrase “extent of agreement.” When the coefficient is not zero, 
there is some agreement between the Observations; as the coefficient 
approaches 1.00, better and better agreement is indicated. 

The correlation methods of estima 


same trait or “dimension” 


erpretations of reliability, unless 


supplemented by understanding of errors of measurement, This will 


be the topic of our next section. 


Relations Between Errors of Measurement and Reliability 


Measurement results in a number whi 


i ch is taken to represent 
Some property of a thing. When we measur 


an individual, we seek a 


Reliability and Validity of Statistical Evidence / 189 


numerical value which may be considered to represent the height, 
weight, intelligence, opinion, or some other property of the individual. 
This numerical value, or observed score, is said to be reliable if re- 
peated observations yield consistent results under certain specified 
conditions. 

In order to understand the assumptions underlying reliability 
theory and to appreciate the consequences that result when the as- 
sumptions are not satisfied, it is necessary for us to approach the topic 
from a theoretical point of view. 


OBSERVED SCORES, TRUE SCORES, AND 
ERRORS OF MEASUREMENT 

The most comprehensive approach to the concept of reliability is 
found in thinking of an observed score as representing a theoretically 
Correct or “true” value, plus an error of observation or measurement. 
If we let Y, be an observed score, X, the true score, and E the error 
of measurement, we may write 


X, = X, + E. (8.3) 


The smaller the error E, of course, the more closely X, approximates 
X,; if there is no error, X, = X,. Unfortunately, we never know 
either the true score X, or the error E, but we may think of a true 
Score as the arithmetic mean of a very large number of repeated 
Observations. For example, we may think of the true intelligence of 
an individual, as measured by a test, as the mean of a very large num- 
ber of scores obtained by repeating the test, assuming the individual 
unchanged by the process. 

Although the concepts of true score and error of measurement 
are entirely hypothetical, we shall find them invaluable in coming to 
grips with reliability theory. It will help to clarify the concepts if 
we think about what a laboratory technician does when he wants to 
determine, say, the correct or true weight of a substance. It is standard 
practice to weigh the substance repeatedly and to take the arithmetic 
mean of the weights observed. When conditions which affect the weight 
Systematically are controlled, so that the errors in successive observa- 
tions are truly random, the arithmetic mean of the observed weights is 
considered to be the true weight of the substance. Before leaving the 
example, let us note another standard laboratory practice. After the 
mean or hypothetical true weight is determined, the mean deviation 
Or the standard deviation of the observed weights would be reported 


às an index of the precision of the weighing process. 


190 / Statistics in Education and Psychology 


It rarely is possible in measuring a “dimension” of a human being 
to repeat the measuring process under controlled conditions, and 
thereby to estimate the true score and magnitude of errors directly. 
The measuring process may change the individual, and the change 
resulting from one trial may carry over into a second. Various other 
influences may result in an actual change of true score or in corre- 
lation between errors during successive repetitions of the process. 
Hence, we ordinarily must estimate true scores from the results of 
only one or two measurements and the extent of error in the process 
in a gross or aggregate sort of way. The fact that this can be done 
rationally is of great importance in psychological measurement. 

Let us suppose that we know both the true Scores and the ob- 
Served scores of 20 individuals, and that the errors of measurement 
are perfectly compensating and uncorrelated with the true scores. If 
we actually knew the true scores, of course, we would not need to 
measure the individuals, but that thought need not detract from our 
development of reliability theory. The supposititious data for the 20 
individuals are shown in Table 8.1. 

The student can verify that the means and standard deviations 
of true scores, observed Scores, and errors are as shown at the foot 
of the table and that there is no correlation between true scores and 
errors. It will be noted that the Variance of the observed scores is 
equal to the sum of the variances of true scores and errors. When 


errors are uncorrelated with true scores, this relationship is always 
true, and we may write 


2 2 

So = sç + 52 (8.4) 
where s? is the variance of the observed scores, s? is the variance of 
the true scores, and s? is the variance of the errors. By transposing 
and dividing by s?, we obtain 


2 2 
So 5 
SM es d — S, 8.5 
E E (8.5) 


It will furthermore be noted that the mean of the true scores in Table 
8.1 is equal to the mean of the observed Scores. These facts are true 
because (1) rhe errors are uncorrelated with true scores and (2) the 
errors are perfectly compensating. The italicized conditions must 


never be forgotten. They are of utmost importance in understanding 
and interpreting reliability. 


Reliability and Validity of Statistical Evidence / 191 


TABLE 8.1 


Theoretical Relations Between True 
Scores, Observed Scores and Errors 


of Measurement 
TRUE SCORE OBSERVED SCORE ERROR 

x X; E 
18 17 —1 
37 35 —2 
28 28 0 
31 37 6 
42 44 2 
36 36 0 
11 15 4 
32 27 =5 
24 25 1 
13 14 1 
21 14 -1 
22 21 —1 
15 18 3 
18 16 —2 
33 38 5 
27 23 —4 
26 28 2 
34 34 0 
25 22 -3 
27 28 1 

sum 520 520 0 

MEAN 26.0 26.0 0 

is? 67.3 77.6 10.3 


THE STANDARD ERROR OF MEASUREMENT 
AND THE RELIABILITY COEFFICIENT 

We are now in position to examine the reliability coefficient as a 
function of errors of measurement. 

Since in a very real sense an observed score is dependent upon a 
true score, let us consider the regression of observed scores on true 
Scores. In Figure 8.1 the true scores and the observed scores of Table 
8.1 are plotted on the horizontal and vertical scales, respectively. 
In the previous chapter, we learned that it is possible to summarize 
the relationship between two variables by the regression line and 
the standard deviation of the deviations from the regression line, 
the latter being the standard error of estimate. We are not here in- 
terested in the equation of the regression line, since in the real situation 
we do not have true scores from which to estimate observed scores, 


192 / Statistics in Education and Psychology 


48 
44 H 
40r 
. * Line of 
36r + regression 
m of observed 
o6 32 scores on 
E true scores 
da 
o 
3 24} 
ó 2. 
20r 
16} 
12 
8 £. ox " L L 
12 16 20 24 28 32 36 40 44 48 
True score 
Fig. 8.1. Regression of observed scores on true scores. Vertical distances of 
observed sc. 


ores from regression line are errors of measurement. (See Table 8.1.) 


ard error of estimate. The deviations 
from the line of regression of observed scores on true scores obviously 


are the errors of measurement of column 3, Table 8.1. Following 
equation (7.10), we have, for the Beneral case, 


Se = Sin —sl1— Pis) (8.6) 


Analogously to equation (7.12) we may write 7, = s?/s$? Since the 


reliability coefficient 7, also equals the ratio of the true score variance 
to obtained score variance, i.e., 


s2 
rn =, 8.7) 
E í 
we have r2, = ri By substitution in (8.6) we obtain 
se = sü — r). (8.8) 


The standard deviation 
taking square roots in (8.8), 


Sk = $3V/T = mr, (8.9) 
pee 


* This is an instructive relationship. 
Is the coefficient of determination or the 
accounted for by true score variance. 


of the errors of measurement s, is, after 


It tells us that the reliability coefficient 
Proportion of observed Score variance 


Reliability and Validity of Statistical Evidence / 193 
If we solve (8.8) for r, we get 


n=1—%. (8.10) 


a relationship also evident from (8.5) and (8.7). 

Thus the relation of the reliability coefficient to observed score 
and true score variances may be expressed by (8.7), and its relation 
to observed score and error variances may be expressed by (8.10), 
provided the errors of measurement are uncorrelated with true scores. 
We shall return to these relations later. 


METHODS OF ESTIMATING THE STANDARD 
ERROR OF MEASUREMENT 

Our discussion of errors of measurement to this point has been 
entirely theoretical. This has been necessary in order to show and 
emphasize the relationship between the reliability coefficient and the 
standard error of measurement and the conditions under which the 
concept of reliability has clear-cut meaning. We now turn to the 
Practical problem of estimating the standard error of measurement. 
If we knew rj, we could, of course, determine s, from equation (8.9), 
but for several reasons a direct estimation of s, is advisable. 

We have seen that in order to estimate r;, we must have two 
Observed scores for each individual. The same is true in estimating 
Se. The pairs of scores for the individuals may be observed by (1) 
giving parallel forms of a test or giving the same test twice, or (2) 
giving a single test which can be divided into equivalent halves, Let 
Us consider case (1). For each individual we shall have two equations, 


similar to equation (8.3), 
X, = X, + Ey 
Xs X, + Es. 


I 


If X, and X; are equally good estimates of the individual's true score 
X, We may subtract the second equation from the first and obtain 


X, — X, = E — Es. 


This tells us that the difference between the two observed scores of 
an individual is equal to the difference between the errors in the scores, 
Provided the two are comparable estimates of the individual's true 
Score. The standard deviation of the series of differences X; — X> 
may thus be viewed as the standard deviation of the series E, — Es. 


194 / Statistics in Education and Psychology 


If the errors are uncorrelated, the standard deviation of the differences 
E, — Es is equal to the standard deviation of the sums Ei + Es. 
Assuming that there is no correlation between errors, we may write 


SEE, = Sy Xi (8.11) 


ie., the standard deviation of the sums of errors in observed scores is 
equal to the standard deviation of the differences between observed scores. 
If we now can assume that the X, and the X» series of obtained scores 
contribute equally to error variance, the standard error of measure- 
ment s, attaching to either X, or X> as an estimate of X, will be 
1/x/2* of that given by (8.11). Since 1/V/2 = .707, we finally have 


Se = 0I. y. (8.12) 


Hence, to estimate s,, 


when we have pairs of scores observed by giving 
parallel forms of a te 


st or by giving the same test twice, we find the 
standard deviation of the differences between scores and multiply by 
:707. In finding the differences, the X's may be subtracted from the 
X's or the X's from the Xy's, but the subtraction must be consistent 
and the signs of the differences regarded. Ideally, 
of the differences is zero, but practically t 
(Ref. 34, p. 250) appears to have been t 
lationship in (8.12). 

The procedure in estimating s, fro 
the same as the above. Given the half- 


of course, the sum 
his is never the case. Otis 
he first to point out the re- 


m half-test scores is essentially 
test scores 


wits 
I 
NI = 


X, + En, 
2 


Nie 


Xi = s X, + Er 
H i 


for each individual, we obtain by subtraction 
i — Xr = Er — Er, 
2 TI 2 Tl 
Since the standard deviation of the differenc 


r es between errors is equal 
to the standard deviation of their sums, p 


rovided the errors are un- 


* Halving the variance results in a 


standard deviation 1/4/2 times the original, 
as can easily be demonstrated. E 


Reliability and Validity of Statistical Evidence / 195 


correlated, we may write 


= Sx 


SELLE Xp: 
i 


1 m 
2 2 


ES 
I 


Since the sums E, + Er are in fact the errors in the total observed 
scores, we have ? n 


(8.13) 


Hence, to estimate s, from half-test scores, we need only to find the 
standard deviation of the differences between half-test scores. In 
finding the differences we must subtract consistently and must regard 
the signs of the differences. Rulon (Ref. 37) appears to have been the 
first to point out the relationship in (8.13). 

The half-test method of estimating s, is of wide usefulness. When 
the half-test standard deviations are equal, this direct method gives 
results in exact agreement with those obtained by the indirect method of 
formula (8.9); when the standard deviations are unequal, the indirect 
method gives a smaller estimate. When the standard deviations of the 
Observed scores are unequal, no estimate of error can be logically 
defended, but the one determined directly from the differences be- 
tween scores is at least a safer criterion to use in interpreting observed 
Scores. Let us point out again, however, that when the assumptions 
of equivalent tests or half-tests and random errors of measurement 
are unsound, no statement regarding reliability has clear meaning. 


SUMMARY 
The reliability theory and estimates that we have discussed rest 
upon two major assumptions: 


a. That the observed scores from parallel forms, test-retest, or half- 
test administration are comparable measures of the same thing, i.e., 
are truly equivalent except for errors of measurement. 

b. That the errors of measurement present in observed scores are ran- 
dom, i.e., compensating and uncorrelated with true scores or with 


themselves. 

When the assumptions are satisfied, the standard error of meas- 
urement and the reliability coefficient are related as in equation (8.9). 
Ordinarily the assumptions are not fully satisfied, and it is preferable 
to estimate s, directly from the differences between pairs of test scores 
Or pairs of half-test scores rather than by use of (8.9). When the 


196 / Statistics in Education and Psychology 


assumptions are poorly satisfied, any estimate of reliability is cloudy 
in meaning. . 

In the next section we shall consider the interpretation and use of 
estimates of reliability. 


Interpretation and Use of Estimates of Reliability 


The two hypothetical questions that come up in interpreting the 
reliability of an observed measure are (1) If the observations were 
repeated a large number of times, would the results be in agreement 
to an acceptable degree? (2) Is it reasonable to suppose that the mean 
of the measures would approach the “true dimension" as more and 
more measures were taken? 

The laboratory technician ordinarily can deal with the questions 
in a direct way. He can repeat the measurements on an object and 
take as many values as he needs to determine a "true value" to within 
a specified degree of precision. The measuring process ordinarily 
does not change the Object, and independence between successive 
observations can be maintained so that errors tend to be random. In 
this case both the estimation and interpretation of reliability are 
straightforward and convincing. 

In psychological measurements these questions have to be ap- 
proached indirectly, under circumstances which make it difficult to 
interpret the answers. Rarely can there be assurance that the meas- 
uring process is not changing the abilities being measured to some 
indeterminable extent, and that errors in Successive observations are 
independent. 

In this section we shall consider some of the complexities that arise 
in interpreting and using estimates of reliability in psychological meas- 
urements. We shall find that reliability can at best be interpreted only 


with reference to a particular group, instrument, experimental situation, 
and use of the measures. 


THE USE OF se IN INTERPRETING AN OBSERVED SCORE 


Both s, and r, are measures of the reliability of obtained scores. 
The reliability coefficient r, is an abstract measure and may be used 
to compare directly the reliabilities of two or more tests or measure- 
ment processes. It has little further usefulness. 


Since s, is a denominate number, being expressed in the unit of 


the original measures, it can be used in judging the reliability of a 
Single observed score, provided two assumptions are satisfied. The 


Reliability and Validity of Statistical Evidence / 197 


first assumption is that the errors are independent and distributed 
normally, i.e., that the errors are truly random. The second is that 
the errors are homoscedastic or scattered equally for the various 
observed scores. When the assumptions are sound, s, being the 
standard deviation of the errors, enables us to judge the accuracy of 
the observed scores and to construct confidence bands for true scores. 
Under the assumption that the errors are distributed normally, about 
68 per cent of the observed scores will fall within ls, of the true scores 
they represent; about 95 per cent within 2s,; and 99.7 per cent within 
3s. Thus, approximate confidence bands for true scores may be 
constructed as follows: 


68% confidence band for Xa: X, = 15 
95% confidence band for Xo:Xo = 252 
99.7% confidence band for X4: X, + 3s. 


Any particular confidence band for X, may be constructed by use of 


Table C, Appendix. (Cf. pp. 149-150.) 

Because of errors of measurement, an obtained score should be 
thought of, not as a single value or point, but as a band extending ks, 
above and below its observed value. The value of k will depend, of 
course, on how sure we want to be that the band in fact contains the 


true score. 

Regarding the assumptions underlyi 
examined by means of the differences 
test scores. These differences, being di 
be distributed normally if the errors are distributed normally and 
are independent. The check works only one way, since normally 
distributed differences do not necessarily mean normally distributed 
independent errors, but it appears to be sufficient, practically speaking. 
The second assumption may be roughly checked by inspection of the 
differences. A better check, however, is possible by comparing the 
standard deviations of sets of differences at various observed score 
intervals, For example, we might compare standard deviations of the 
differences corresponding to the scores in the four quarters of the 
range. If marked inequalities exist, it would be improper to use or to 
Téport a single standard error of measurement. . 

It has been the writer's experience that the assumption of homo- 
Scedasticity of errors frequently is questionable. Practically, the 
assumption means that a test Or measuring instrument must be equally 
accurate throughout the range. When the range is relatively great and 


ng the use of So, both can be 
between whole-test Or half- 
fferences between errors, will 


198 / Statistics in Education and Psychology 


the sample is large, it is rather unusual to find the assumption clearly 
acceptable. 


REAL RELIABILITY AND ESTIMATES OF IT 


It is helpful in interpreting reliability and in understanding the 
consequences of unreliable data to distinguish between reliability and 
estimates of reliability. Unless we make the distinction, we are apt to 
fall into the rather common mistake of assuming that circumstances 
which do not affect estimates of reliability do not affect real reliability. 

The fundamental purpose of measurement is to determine a “true 
dimension" of an object. If the determination is good, real reliability 
exists, and estimates of it will confirm the fact. Unfortunately, how- 
ever, estimates of reliability may be satisfactory, yet the determination 
very poor. 

When we have a set of measures of an ability in a group, we ordi- 
narily use them to do one or more of the following things: 

a. To distinguish between the individuals in the group. 
b. To determine the “true” mean or some other 
tistic for the group. 


c. To estimate the “ 
dividual. 


"true" summary sta- 


true" amount of the ability possessed by an in- 


not be distorted. Although error: 

fere, the damage will not be serious i 

dividuals are relatively large as compared with the errors, 
The effect of sample variability or 

reliability coefficient of an instr 


range. If this assumption is true, the standard errors of measurement 
over two different ranges will be e 


Se = sıVl =m, 


Se = SV 1 — rh. 
Equating and dividing we obtain 
St, — r 


= TTE (8.14) 


Reliability and Validity of Statistical Evidence / 199 


in which š is the estimated reliability of an instrument or measuring 
process, having a reliability rı in a group in which the standard de- 
viation is s}, when applied to a group in which the standard deviation 
Is So. 

To illustrate the use of formula (8.14), suppose that a test shows 
a reliability of .84 when applied to a group in which the standard 
deviation of the observed scores is 12.0, and that we want to estimate 
the reliability of the test for a group which would have a standard 
deviation of 8.0. The values for substitution are s; = 12.0, rı = .84, 


and s9 = 8.0, so that 
120 _ vl1-r$. 
0 I — 84 


8 


Solving for rZ we get .64. 

Formula (8.14) can, of course, be used to estimate the effect of 
an increased range upon reliability, but the assumption upon which 
it rests is somewhat less plausible in this case. As a matter of fact, 
the assumption appears to be rarely clearly acceptable. The chief 
value of (8.14) is that it emphasizes the sensitivity of reliability to 
variability in the group. Thus, the logic supports the common sense 
notion that when the differences between individuals are relatively 
large, it is not difficult to distinguish between the individuals reliably. 

Regarding the use of measurements to determine summary sta- 
tistics of a group, random errors theoretically do not affect the mean, 
Since they tend to be compensating, but they do inflate the standard 
deviation. These facts can be deduced from equations (8.3) and (8.4). 
On the other hand, constant errors do not affect the standard deviation, 
but they do affect the mean. Since means and standard deviations 
Ordinarily are used together, both constant and random errors cloud 
interpretation to some extent. 

Random errors of measurement decrease or attenuate the coeffi- 
cient of correlation between two variables. It can be shown that the 
"true" correlation coefficient r4, of variables X and Y, estimated 
from the observed coefficient rzy, is 

Try 


Vr, ry 


in which r, is the reliability coefficient of the measures of x and ry 
the reliability coefficient of the measures of Y. This correction of an 
Observed r,y is known as correction for attenuation, the coefficient 


š (8.15) 


Tow 


200 / Statistics in Education and Psychology 


Fw being the hypothetical true correlation between X and Y if perfect 
measures (perfect in the sense of freedom from errors of measurement) 
of both were available. It will be seen that the correction is unrealistic. 
The relationship in (8.15) is primarily useful because it brings out the 
effects of errors of measurement on correlation and thus reemphasizes 
the great need for reliable measures in research, 

Both errors of measurement and constant errors hamper the 
attempt to estimate the true ability of an individual. The former 


s. Of far greater importance than estima- 
on of types of errors, Constant errors 


| eliability, but they may be disastrous in 
group comparisons. There is little doubt that part of the measured 


constant errors arising from differences in m 
Unless there is some a 


cerned, the distinction between constant error and change is not im- 
portant; as regards real reliability it is vital. 
Correlation of errors with true scores and 
only makes interpretation difficult, it destroys t 
Which reliability theory rests. Although it is not 


with themselves not 
he foundation upon 
Possible to determine 


Reliability and Validity of Statistical Evidence / 201 


whether errors are correlated through examination of observed scores 
or their differences, it is not difficult to identify situations that invite 
correlation between errors. In general, when there is present in the 
testing situation or situations any factor which causes the observed 
Scores for each individual to be consistently either above or below the 
corresponding true score, there is correlation between errors. Re- 
Strictive time limits on tests when speed is not considered part of the 
ability tested, fatigue, failure to understand directions when this is 
not part of the ability tested, emotional strain, exceptional motivation, 
cheating, and distractions are examples of factors which tend to 
operate to bring about correlation between errors. 

In passing, let us note that observed measures make the individ- 
uals in a group seem more different than they are, since there are 
always errors of measurement present. The extent of the exaggeration, 
On the average, is seen in the equation rı = sz/s;. If an instrument is 
characterized by a reliability coefficient of, say, .6, the true variance 
of the group is only .6 of the observed variance. Practically these facts 
are of little value, since we are limited in our analyses to the measures 


We can obtain, but they add depth to interpretation. 


ADVANTAGES AND LIMITATIONS OF THE 
COMMON METHODS OF ESTIMATING RELIABILITY 

The test-retest method is the only one of the three common 
methods of estimating reliability which meets the requirement of re- 
peated measurement, but it is almost sure to be perplexed by constant 
error as well as varying changes in "true" scores due to learning and 
memory effects. The magnitude of the errors probably is directly 
Proportional to the complexity of the ability under measurement. lt 
Would thus seem desirable to restrict the method to measures of the 
Simpler abilities, such as motor coordination. reaction, sensation, and 
Simple elements of perception. 

Theoretically, the parallel-form method is the most generally 
applicable and sound. When two equivalent forms of a test are avail- 
able and can be administered within a period of, say, not less than a 
day or more than a week, the prerequisites of reliability estimation 
are usually best met. Moreover, since parallel forms double the 
sampling of content or test items, the reliability estimate better reflects 
the actual correlation of obtained scores with true scores and the actual 
error of measurement. Practically, however, it has several limitations. 
In practice it is often extremely difficult to construct a test of desired 


202 / Statistics in Education and Psychology 


length and then to construct an "equivalent" test. The attempt fre- 
quently results either in a test containing items so similar to the first 
that the method reduces essentially to test-retest, or in a test con- 
taining items so dissimilar that equivalence patently does not exist. 
As another limitation, the amount of time needed for the determination 
of reliability is doubled. This can be a real obstacle, particularly in 
schools and colleges. 

It would seem that much of what is good in the parallel-form 
method can be had in the method of equivalent halves, provided that 
administration of the half-tests is separated by, say, not less than a 


i It is well known that stepped-up half-test 
estimates of reliability tend to be higher than estimates arrived at by 
other methods, particularly so for Speeded tests. 
that the method is inde 
Splitting a test into halves, While this is true, it is 
important. All methods of estimati 
the sense that they are attem 
never known, the real reliability of the test. 
The great advantage of t 
venience. In the practical sit 
be done. Ordinarily the best Way of splittin 
construct the whole test so that th 
half-test and the even-numbered -items the other. 


be done on the basis both of editorial study of the 
mental tryout. 


Ideally this would 
items and experi- 


REPORTING RELIABILITY DATA 


It should be clear that a 
interpret reliability estimates, 
formation needed varies to som 


great deal of information is needed to 
While the minimum amount of in- 
€ extent with the use that is made of 


Reliability and Validity of Statistical Evidence / 203 


the measures whose reliability is of concern, as a rule the following 
points should be considered in reporting research: 


a. The group of individuals: Needed information includes specification of 
the population, description of the sampling procedures, size of the 
group, and variability of the individuals. Any unusual characteristic 
of the group that might affect the reliability estimate and the use of 
the instrument or process in further samples should be noted. 

b. The testing or experimental situation: In this connection a description 
of all factors in the situations or tests which may give rise to correlated 
errors, constant errors, or changes in “true scores" is needed. Special 
attention should be given to any unique or unexpected factors. 

c. Methods used in estimating reliability: There are a great many ways of 
estimating reliability, and no one way is “best” even in a given situa- 
tion. The researcher should describe the method used and tell why it 
was considered appropriate and what the estimate means. 

d. Equivalence of parallel forms or half-tests: Needed data include means 
and standard deviations of the two or more series of observed scores 
and a note regarding similarity of content of tests or half-tests. 

€. Errors of measurement: The total distribution of differences between 
scores or half-test scores and the distributions at several intervals of 
the range may help support the assumption of normally distributed 
errors uncorrelated with true scores. In support of the assumption of 
homoscedasticity of errors, standard errors of measurement at several 
intervals of the range should be reported. (Ina small sample, of course, 
this information is of little value.) 


Such information will not only be of great value to the reader, 
but will accomplish perhaps even a greater service. Knowing in 
advance that the information is needed, the researcher will himself 
deal more eflectively with the reliability issues in his study than he 


otherwise would. 


CONCLUDING REMARKS 


Since the assumptions underlying the estimation of reliability are 
rarely fully satisfied in practice, all interpretations of reliability should 
be cautious, and the use of reliability statistics should be accompanied 
With some misgivings until empirically verified. There are no sta- 
tistical techniques to take the place of judgment and common sense in 
interpreting reliability. This is particularly true in psychological testing, 
Where qualitative matters, such as content of the test and opportunity 
of the individuals in the group to have the common experiences pre- 
Supposed by the test, may affect real reliability. 


204 / Statistics in Education and Psychology 


We have confined our discussion of the interpretation of relia- 
bility mainly to the relatively narrow field of psychological testing. 
The interpretation of the reliability of questionnaire, rating scale, 
Score card, historical evidence, and so on, usually involves other 
complexities and greater uncertainties. The underlying question, how- 
ever, is the same, namely, what reasons are there for supposing that 
repeated, independent observations will yield approximately the same 
results. The difficulty of the question must not be allowed to detract 
from its importance. It is the central question in all research. ` 


Test Item Analysis 


In present-day testing, analytic studies of reliability and validity 
usually begin with the individual items in the test. This sort of study 


There are a great many com- 


, Chapter IX, and various pages 
€ shall begin our brief discussion 


THE ITEM ANALYSIS CHART 


The item scores, half-test Scores, 
matics test and the total Scores on a s 
students are shown in Table 82. Cor 
scored “1,” incorrect responses “0.” 
ways begins with items scored in this 
body of the table are the basic data fo 
formation we can obtain from the mat! 
might result from analysis of incorre 
item scores. 


and total scores on a mathe- 
tatistics achievement test of 32 
rect responses to the items are | 
Conventional item analysis al- 
way. The item scores in the 
r item analysis. All of the in- 
hematics test, except that which 
Ct responses, is available in the 


S note, parenthetically, that 
not ordinarily be warranted 


ITEM DIFFICULTY 


There are at le 
of a test item. Firs 
of items from easy 


ast three plausible ways of estimating the difficulty 
t, as a matter of judgment, we might rank a number 
to difficult, or estimate a given item more difficult 


Reliability and Validity of Statistical Evidence / 205 


than a seco n Ç 
the only me becas pi o been tried out, this obviously is 
peo e id ing their difficulty. As a second way, we might 
item. tHe great y in terms of the average time needed to complete an 
metod < 5 i w time required the greater the difficulty. This 
leones egens en practical disadvantages and, at present, is of 
uis useful way of estimating the difficulty of an item is in 
pa proportion of examinees who respond correctly—the 
DiiporBors Hin ce the more difficult the item. Inspection of the 
difficult f at the foot of Table 8.2 indicates that item 27 is the most 
n or the group and item 1 least difficult.* 
e i ich o dan of total test scores and the reliability of the 
SE am ially functions of item difficulty. Other things being 
Beam of 80 nost reliable test for an entire group is the test containing 
Tests < cent difficulty, i.e., items of maximum variance. 
ire vine mning items of less than 50 per cent difficulty discriminate 
the Fons PU Pr the individuals of better than average ability in 
few from dee hence are more reliable than easier tests in selecting a 
yield dina group for scholarships or “honors.” Such tests tend to 
taiting iie utions skewed to the right. On the other hand, tests con- 
tsibulious ra of greater than 50 per cent difficulty tend to yield dis- 
Among th s ewed to the left, and hence discriminate more reliably 
8 the individuals of less than average ability in a group. 


IT 
EM DISCRIMINATION 
ular item are correlated positively 


Whe à 
hen the scores on a partic 
aid to be discriminating. 


Si 

- toral scores on the test, the item is said to be di 
an item E various ways of estimating the discriminative power of 
Correct s: he simplest way is that of subtracting the proportion of 
scores fr esponses in the half of the group having the lowest total 
highest > the proportion of correct responses in the half having the 
€ pplying this method to, say, item 6 in Table 8.2 we would 


difficulty in terms of percentage succeeding. 
i ich is passed by all of 


ferences Him variance indicates that the 1 
Shown eee brought out. That items of 50 per cent difficulty s 
- If 5 individuals in a group of 10 pass an item and 5 fail it, 25 differences 
h individual passing is different from each 


Are br 
indi Ought out by the item, since eac e 0 
24 comparisons are permitted; if 7 pass 
ly be generalized. 


lvid ba 
and 3 ual failing. If 6 pass and 4 fail, only ) 
ion can readi 


fai 
ail, only 21; and so on. The illustrat 


206 / Statistics in Education and Psychology 


STATISTICS 
TEST 
92 
100 
105 
94 
88 
84 
93 
98 
93 
80 
82 
90 
81 
94 
80 
72 
82 
81 
79 
86 
85 
78 
78 
87 
84 
93 
78 
74 
75 
84 
62 
69 


L 
30 
29 
28 
27 
25 
22 
21 
20 
20 
20 
19 
17 
17 
15 
15 
15 
14 
14 
14 
13 
13 
13 
13 
2 
2 
l 
l 
l 
0 
6 
6 
2 


MATH, TEST 
15 
14 
14 
14 
12 
11 
9 
9 
9 
9 
0 
8 
7 
7 
7 
6 
6 
7 
6 
5 
6 
6 
5 
6 
6 
5 
4 
4 
3 
3 
2 
I 


27 28 29 30 | ODD EVEN TOTA 
15 
15 
14 
13 
13 
l 
2 
1 
l 
l 
9 
9 
0 
8 
8 
9 
8 
7 
8 
8 
7 
B 
8 
6 
6 
6 
7 
7 
7 
3 
4 
l 


TABLE 8.2 
Item Scores, Total Scores, and Half-Test Scores of 32 Students on a 


Mathematics Background Test and Total Scores on a Statistics Test 


MATHEMATICS TEST ITEM NUMBER 
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 


™ — — — — — 
"SEAN SOMO SIS Somme EE 
wo — — — — 
o DO O OO OH eO e er ec o-oc 
— c — — CS 
<+ — 
w ww c LLL oait pu ud, o= o= sq ey aoc 
en — w — — — — — — — — — 
vM w exe 9A n vd Cy aua ot cust 
— — a — — — — 
l O o 


STUDENT 

l 

2 
3 
4 
5 
6 
7 
8 
9 
10 


PROPORTION 


CORRECT, p 
PROPORTION 


.28 .12 .22 ,16 .38 .81 .25 .66 .81 .12 .59 .53 .81 .47 .25 .25 ,28 .28 .34 .69 .81 .38 .75 .47 .81 .91 .53 .34 .50 
.20 .11 17 .13 ,24 .15.19 .22.15.11 24 .25 .15 .25 .19 .19 .20 .20 .22 ,21 .15 .24 .19 .25 .15 .08 .25 .22 .25 


INCORRECT, q 


Reliability and Validity of Statistical Evidence / 207 


obtain a discrimination index of 0, since the proportion of correct 
responses is 10/16 in both upper and lower halves of the total test 
Score distribution. For item 23 we would obtain an index of 14/16 — 
6/16 or .50. A similar estimate could be made on the basis of upper and 
lower thirds. All such indexes may vary from —1 to 1. Items passed 
or failed by all in a group obviously have no discriminative power. 

The best method of estimating discrimination is that of biserial 
correlation. As we have seen, there are two biserial coefficients, 7p» 
and ry, the latter being based upon the assumption that the dichotom- 
ized variable is normally distributed. Although rj; would seem the 
better and more logical coefficient, since item scores as observed are 
E eally dichotomous, the fact that 7, can be approximated quickly lends 
to it a practical advantage. Flanagan (Ref. 16) has devised a way of 
approximating r, from the 27 per cent of the group scoring highest on 
the whole test and the 27 per cent scoring lowest. We reproduce an 
abridgment of a table based upon Flanagan's study in our Table 8 3. 

Let us find >, for item 23 of our mathematics test by use of 
Flanagan's method. Since 27 per cent of our group is about 9, we 
determine the proportion of correct responses among the upper 9 
and lower 9 individuals in the group. The proportion among the 
Upper 9 is 9/9 or 1.00; and in the lower 9, 2/9 or .22. Entering Table 


8.3 at column 98 and going down to row 22. we read .80. This is an 
approximate value of the normalized biserial coefficient of correlation 


Of item 23 with the total test scores. 

As noted at the foot of Table 8.3, whe 
responses in the lower 27 per cent exceeds 
the table with the lower 27 per cent proportion at the top and attach a 
negative sign to the coefficient. Items showing negative discrimination 
tend to be worse than useless in a test. On the face of it, correct re- 
SPOnses to such items should be scored wrong and incorrect responses 
ght, but such procedure would raise several knotty philosophical 
"sues. Examination of negatively discriminating items usually reveals 
flaws and inconsistencies which should have been detected In CON- 
‘tructing the item. Statistical analysis is no substitute for careful 
“onstruction and editing of items. At the same time, care 
tion of items is no substitute for statistical analysis. The two are best 


tl 
tought of as complementary. 
As a rule, r, is considerabl 


d IS estimated to be .80, rpo is .64, 
€ Statistics, Y, — 19.4, Ys = 10.7, pg = 


n the proportion of correct 
that in the upper, we enter 


For item 23, where 


t can verify by use of 
— 6.7, and 


y larger than 75» 
as the studen 
24, and Sy 


208 / Statistics in Education and Psychology 


TABLE 8.3* 


Normalized Biserial Coefficientst of Correlation as Determined 
from Proportions of Correct Responses in Upper and Lower 
27 Per Cent of the Group 


PROPORTION OF CORRECT RESPONSES IN THE UPPER 27 PER CENTÍ 
02 06 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98 
E 02 | 00 19 30 37 43 48 51 55 58 61 63 66 68 70 72 73 75 77 79 80 82 84 86 88 91 | 02 
Z 06 00 11 19 26 31 36 40 44 47 50 53 56 59 61 64 66 68 71 73 76 78 81 84 88 | 06 
S 10 00 08 15 21 26 30 34 38 41 45 48 51 54 57 60 63 65 68 71 74 77 81 86 | 10 
E14 00 07 12 18 22 27 31 34 38 42 45 48 51 54 57 60 63 67 70 74 78 84 | 14 
m 18 00 06 11 16 20 25 28 32 36 39 43 47 49 53 56 60 63 67 71 76 82 | 18 
N28 00 06 10 15 19 23 27 31 34 38 42 45 49 52 56 60 63 68 73 80 | 22 
Š 26 00 05 09 14 18 22 26 30 33 37 41 44 48 52 56 60 65 71 79 | 26 
$30 00 04 09 13 17 21 25 29 33 37 40 44 49 53 57 63 68 77 | 30 
"E 00 04 09 13 17 21 25 29 33 37 41 45 49 54 60 66 75 | 34 
E 38 00 04 08 13 16 20 25 29 33 37 42 47 51 57 64 73 | 38 
z 42 00 04 08 12 16 20 25 29 33 38 43 48 54 61 72 | 42 
y 46 00 04 08 12 16 21 25 30 34 39 45 51 59 70 | 46 
$50 00 04 08 13 17 21 26 31 36 42 48 56 68 | 50 
2 54 00 04 08 13 17 22 27 32 38 45 53 66 | 54 
258 00 04 09 13 18 23 28 34 41 50 63 | 58 
be 00 04 09 14 19 25 31 38 47 61 | 62 
š 00 04 09 15 20 27 34 44 58 | 66 
E70 00 05 10 16 22 30 40 55 | 70 
ee 00 06 11 18 26 36 51 | 74 
S ei 00 06 12 21 31 48 | 78 
$ as 00 07 15 26 43 | 82 
Edo 00 08 19 37 | 86 
AF 00 11 30 | 90 
9 00 19 | 94 
Ë 98 00 | 98 
02 06 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98 


* This table is abridged from J. C. Flana 


efficients originally prepared for the Cooperative Test Service. It is included here 
with the generous permission 


t of Dr. Flanagan and the Educational Testing Service 
of Princeton, New Jersey. 


T Decimal points are omitted. 


ł If the proportion of Correct responses in the lower 27 per cent exceeds that in 
the upper, enter the table with the lower 27 Per cent proportion at the top and attach 
à negative sign to the coefficient. 


gan's table of normalized biserial co- 


formula (7.17). The difference between c 
practical concern, however, since a ran 
criminative power on the basis of any 
narily correspond rather closely to the r 
Thus, if the purpose of the analysis w 
items of low discrimination, one met 
Bood as another. 

eliminated by sim 


oefficients tends to be of little 
king of items in order of dis- 
of the coefficients would ordi- 
anking on the basis of another. 
ere to delete a given number of 
hod would tend to be about as 
In fact, as a rule, about the same items would be 
pler methods, such as that of upper and lower 


Reliability and Validity of Statistical Evidence / 209 


halves. For more accurate analysis or for theoretical discussion, how- 
ever, rpy should be used, since it makes use of all of the information 
available. 

, Other things being equal, the greater the discrimination of the 
items, the more reliable the whole test. We digress for a moment to 
consider terminology. The terms ¡tem reliability and item validity are 
sometimes used synonymously with item discrimination. It would 
seem preferable to limit reliability to the sense of correlation between 
repeated measures or similar measures and to restrict validity to mean 
consistency of items with criteria outside of the test. 

_ Although our discussion to this point has been in terms of test 
items and continuous total test scores, the concept of item discrimina- 
tion has wide application. We might, for example, correlate the “yes” 
Or "no" responses on a questionnaire item with some measure of 
response to parts or all of the questionnaire. Whenever we can cor- 
relate, by one of the methods discussed in the preceding chapter, the 
responses to a single item with some larger measure of performance 
on an instrument, we can obtain an index of the discriminative power 
Or consistency of the item. The deletion of inconsistent items will 


generally improve the instrument. 


ITEM INTERCORRELATION 
ns of the items on a test can be determined by 


Let us find the correlation between items 12 
s tallies in a four- 


The intercorrelatio 
fourfold point methods. 
and 15 of Table 8.2. We first enter the item scores a 


fold table as shown below: 


Item 12 
0 1 
1 8 9| 17 
Item 15 i 
0 1 4 15 
19 13 32 


By formula (7.18) we obtain 


(9x10—8X4 _ 47 
% 2. 
V x 15 X 19 X 13 


ady been suggested as one way of de- 
cking items in a questionnaire 
g interdependent items. When 


Item intercorrelation has alre 
termining the consistency of cross-che 
(p. 167). It may be useful in eliminatin; 


210 / Statistics in Education and Psychology 


the intercorrelations of the items in a test are high and the items ap- 
proximately equal in difficulty, the distribution of total test scores 
tends to be bimodal and thus to discriminate Sharply at the middle of 
the scale. (See Ref. 19, p. 451.) This fact may be helpful in selecting 


items for a test designed to discriminate sharply between individuals 
of moderate ability. 


ESTIMATES OF TEST RELIABILITY BASED ON ITEM VARIANCE 


Kuder and Richardson (Ref. 29) have shown that an estimate of 
test reliability may be made from the variance of the total Scores on à 


test and the sum of item variances. Their most generally applicable 
formula may be expressed 


2 
_ n So — 2pqN, 
n= nci (n ) (8.16) 


in which n is the number of items in the test, s? is the variance of the 
total scores, and Xpq is the sum of the products of the proportions pass- 
ing and failing each item, i.e., the sum of the item variances. 

Let us use the Kuder-Richardson formul 


in the formula, we have 


r, = 30 (444 — 5.6 
1 wg 444 


= .90. 


If the items in 


OM a test are in fact equal in difficulty, formula (8.16) 
simplifies to 


2 = — 
= —"_ (So = npg 
ri n=1 ( E: ) (8.17) 


in which p is equal to ba 
(8.17) to the mathematic 
stitution are 


/n and q equals ] — p. 


If we apply formula 
S test data of Table 8.2, 


the values for sub- 


n = 30, , s? = 444, p= = = 54, q = 46 
so that 


— 30 (444 — 30(54(46 
x 3 ( 444 ») = .86 


Formula (8.17) requires much less information than (8.16). Empirical 


Reliability and Validity of Statistical Evidence / 211 


study indicates that it gives fairly good results, even when item dif- 
ficulties vary considerably, as in the example. 

The Kuder-Richardson methods, like the half-test method, do 
not provide estimates of reliability in the sense of agreement between 
repeated measurements. Since they utilize scores obtained from a 
single administration of a test, it would appear likely that to some 
extent they overestimate reliability. They do not take into account 
ordinary day-to-day variation in individuals and are susceptible to 
correlation between errors of measurement. Like the half-test method, 
they are inappropriate for speeded tests. 


ITEM VALIDITY 


A test item may be said to be formally valid when it is consistent 
with content that has been taught, specifications drawn up in advance, 
the opinions of. experts, and so forth. It is experimentally valid when 
it correlates with a criterion variable outside of the test. 

Any of the methods of estimating the discriminative power of an 
item, mentioned earlier, may be used in estimating item validity. For 
example, to determine the experimental validity of a mathematics test 
item of Table 8.2 as a predictor of achievement in statistics, as measured, 
We must find out whether the item scores are correlated with the 
Statistics achievement scores. Let us consider item 12. The achieve- 
ment scores for those passing and failing the item are shown below. 


Item 12 1—92,100,105,94,88,84,93,98,80,82,81,80,86 
Ut 0—93,90,94,72,82,81 /79,85,78,78,87,84,93,78,74,75,84,62,69 


The mean achievement score of those passing the item is 89.5; that of 
those failing the item 80.9. The respective proportions of passes and 
failures are .41 and .59, and the standard deviation of the achievement 
scores is 9.2. When we substitute these values in formula (7.17) we 
Obtain 
(89.5 — 80.9)W/.41 X 59 _ 46 
Tpb 9.2 .46. 


We might, of course, have used Flanagan's shortcut, or the method of 
Upper-lower halves in our analysis. , 

The validity coefficients of the majority of the items in Table 8.2 
are no better than chance magnitude. As a rule, the validity coef- 
ficients of the items in a predictor test are low and, when the sample is 
small, unreliable. Other things being equal, the higher the validity 
Coefficients of the items the greater the validity or predictive value of 


212 / Statistics in Education and Psychology 


the test. The validity of tests usually can be improved to some extent 
through item analysis; however, it is common experience that items 
showing substantial validity are hard to find. 

It is often the case that the criterion variable can best be observed 
in two categories. In validating the items used in a personnel selec- 
tion test, for example, the criterion variable, success on the job, may 
be available in two categories, “succeeded” or “failed.” When this is 
the case, the fourfold point methods of correlation are called for. If 
the criterion variable is observed in more than two categories, con- 
tingency correlation can be used. 


CONCLUDING REMARKS 


Item statistics are relatively much affected by sampling fluctua- 
tions; and, as a rule, those derived from a small sample can be applied 
with little confidence to other samples. The relation of item statistics 
to whole-test statistics is complex. The content, difficulty, discrimina- 
tive power, and intercorrelations of items interact to establish the 
reliability and validity of a test, but the total relationship is too com- 
plex to make use of in practical work. 

Item analysis is not, however, of th 
samples are of fair size, sa 
reliable information indispe 
samples, item analysis ordi 
items and a somewhat bet 


eoretical interest only. When 
y about 60 or more, item analysis yields 
nsable in test improvement. In quite small 
narily results in deleting or changing some 
ter test for future use. The relation of a 
particular item statistic, such as a measure of difficulty or an index 
of discrimination, to whole-test statistics may be useful in developing 
tests for special purposes. Several such possibilities were brought out 
in our discussion. Furthermore, item analysis will usually result in 
improved skill in item construction. It is always instructive to examine 
the content, wording, and position of the various items in a test after 
their statistics are known. 

Nothing has been said about "distractor" or incorrect-response 
analysis or about the correction of test Scores for chance success on 
the items. (See Ex. 26.) These topics are well treated in Reference 6. 
Incorrect responses may indicate mental sets and modes of response 
which underlie errors in thinking. (See Ref. 45.) 


EXERCISES 


1. In a study to determine the number of working hours per week, a na- 


tional professional organization sent out about 15,000 questionnaires to 
its members. About 2,500 of the questionnaires were returned, showing 


- What are the advantages and | 


Reliability and Validity of Statistical Evidence / 213 


an average working week of about 48 hours. Illustrate the meaning of 
sampling, bias, and measurement errors in this situation. (Other, more 
careful, studies actually show an average working week of about 42 hours 


in this profession.) 


- What seems to you to be the most serious limitation of formal validity ? 


Of experimental validity ? 


. A testing agency announced that it had developed a highly valid test of 


intelligence. What information is needed to give the statement meaning? 


- The scores of 32 graduate students on a mathematics background test 


and on a statistics achievement test are shown in Table 8.2. How would 
you determine the coefficient of validity of the mathematics test as a 
predictor of achievement in statistics? How would you determine the 


reliability coefficient of the mathematics test ? 


- Twenty-four teachers in an elementary school were rated by two super- 


visors. If the ratings were numerical and independent, how could their 
reliability be estimated ? 


- What assumptions are made when the reliability of a test is estimated 


from half-test scores? If r, = .60, what is the estimate of r1? 


5 
An achievement test is reduced to one fourth of its original length. If the 
original test had a reliability coefficient of .96, what coefficient would be 


expected for the quarter-test, assuming the quarter-test comparable to 
the original? 


- Suppose that an adjustment inventory requires 1 hr. to administer and 


that its reliability for a given group is .40. How many hours long would 
the inventory have to be to give a reliability of .90 for the group, assuming 


each hour's work equivalent ? 
: in which each of the 


- Describe a situation, not mentioned in the text, 


following would be appropriate. 


- Formal validation. 

. Experimental validation. 

Test-retest estimation of reliability. 

. Parallel-forms estimation of reliability. 


i i i iability. 
. Equivalent-halves estimation of reliabi Lal 
Correction of a reliability coefficient for lengthening a test. 


- Correction of a reliability coefficient for shortening a test. 
imitations of each of the common methods 


@ — o e.o > 


A estimating reliability? kept in mind in interpreting re- 


ASt at least three points 
“Ability. Which do you believe 
n estimating reliability by cor 

e Components are considered. 
Pres ©, the sum of cross products In 
Sed, Exi, = xe, + ere + 09 


that should be 
the mos 
relationa 
d to be true scor 
deviation form, 2x 

Show that ! 


9 
important? 
m we correlate Scores 


e plus random error. 
yx ka, MAY be ex- 
f the errors are 


214 / Statistics in Education and Psychology 


13. 
14. 


Is. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 
23, 


correlated with themselves or with true scores, the correlation coefficient 
overestimates reliability. 

How would you estimate s, for the mathematics test scores of Table 8.2? 
The standard error of measurement of the total scores in Table 8.2 is 
about 1.2. Which of the observed scores represent true scores that are 
very probably above 15? Which represent true scores that are very prob- 
ably below 15? Interpret “very probably" as you wish. 

Given the data N — 1,000, s, — 12.0, s, = 5.0: 


a. How many of the observed scores fall more than 5 points from the true 
scores they represent? 10 points? 

b. If an individual has an observed score of 75, how sure can you be that 
his true score is not 70 or below? 80 or above? 

c. What are the assumptions upon which your answers are based ? 


A test applied to a group in which the standard deviation of the observed 
scores is 15.0 has a reliability coefficient of -96. What coefficient would 
you expect to observe if the test were given to a group characterized bya 
standard deviation of 5.0? What assumption underlies your estimate ? 
Suppose that the reliability of college grades is about .50 and the reliability 
of high school grades about .60. If the Observed correlation between the 
two is .50, what would the theoretical correlation be if both were perfectly 
reliable? Why is this information of no practical value? 


Criticize each of the following statements, What are the qualifications 
needed to make each statement true? 


a. A test having a reliability coefficient as low as .5 can be used to measure 
reliably mean differences between groups. 

b. An observed score at any place on the scal 
in terms of s,. 

c. Constant errors do not affect the reliability of a test. 

d. The reliability coefficients of two t 
ability of the tests. 


e of scores can be interpreted 


ests permit comparison of the reli- 


It is sometimes suggested that an o 
or band on the scale of Scores, rat 
of the suggestion? 

Apply the upper-lower halves method, r. 
estimating item discrimination to one or 
and compare results. 

By the same methods estimate and compare validity indices of several of 
the items as predictors of achievement in statistics. 

Find the intercorrelation of two of the items of Table 8.2. 

In a preliminary investigation of the study habits of college freshmen, a 
researcher gave an inventory comprising 60 items to about 200 freshmen. 
How would he check the reliability and validity of the items if his purpose 


bserved score be thought of as a zone 
her than as a point. What is the merit 


»» and Flanagan's method of 
more of the items of Table 8.2 


24. 


25. 


26. 


Reliability and Validity of Statistical Evidence / 215 


is to find out whether there is relationship between study habits and 
grades? 

Show that if 50 in a group of 100 pass an item and 50 fail, more individual 
differences are brought out than by any other numbers passing and failing. 
Sketch frequency polygons showing marked negative skewness, bi- 
modality, and positive skewness. Which one magnifies differences be- 
tween individuals of high ability, which differences between individuals of 
lowability, and which differences between individuals of moderate ability ? 
The two most widely used formulas for the correction of test scores for 


: [9] 
chance success on items are, S — R — and S’= R + —, in 
n 


n— 1 
which S is the total score, R is the number of correct responses, W the 
number of incorrect responses, O is the number of items omitted, and » is 
the number of options on each item. Show by numerical illustration or 
prove in general that the two sets of total scores S and S’ are perfectly 
correlated. 


CHAPTER IX* 


Statistical Inference 


Statistical data are subject to constant errors owing to bias and 
to the variable errors of measurement and of sampling. In the preceding 
chapter we considered errors of bias and of measurement at some length. 
In this chapter we shall deal with sampling errors. 

The theories of errors of measurement and of sampling are not 
unrelated. When an individual's score on a test is taken as an estimate 
of his true score, an error of measurement attaches to the obtained Score. 
Further scores obtained under comparable conditions usually differ 
from the first and from each other. Analogous to this, when a sample 
statistic, such as a mean is taken as an estimate of the corresponding but 
unknown true statistic of the population, a sampling error attaches to 
the sample statistic. The statistic in further samples can be expected 
to show variation, to some extent. Since the scores or measures in a 
sample are rarely if ever perfectly reliable, a sample statistic usually 
contains both sampling and measurement errors. It is extremely difficult, 
however, to treat both types of variable errors in the same discussion. 


* It is suggested that this chapter be read for general ideas, rather than for 
complete understanding, and returned to as needed during study of later chapters. 


216 


Statistical Inference / 217 


For this reason, we shall first consider sampling errors, assuming that 
the measures in the sample are perfectly reliable. Later we shall elab- 
orate certain of the effects of errors of measurement upon sample 
statistics. 

In the preceding pages we have been concerned primarily with the 
calculation and interpretation of descriptive statistics, such as the mean, 
standard deviation, and coefficient of correlation. In several of the 
discussions, however, important concepts in sampling theory were 
anticipated and briefly previewed. As a background for the present 
chapter, the student may find it helpful to reread at this time pp. 35-37, 
56, 78-79. 


Sampling Theory and Statistical Inference 


The concepts underlying sampling theory previewed in earlier 
chapters may conveniently be summarized in seven paragraphs: 


a. A sampling problem exists whenever the conclusions derived from 
observation of a limited number of individuals are applied to a larger 
(usually much larger) number of individuals. The former number con- 
stitutes the sample; the latter the population. The generalized conclu- 
sions commonly are called statistical inferences. Sampling theory and 
procedures are concerned with the conditions under which sound 
inferences about the characteristics of a population can be drawn from 
a sample. 

b. The most important of these conditions is that of a random sample of 
individuals from a clearly specified population. Unless this condition 
is met, sample evidence has no demonstrable generality. A random 
sample can be considered to be an unbiased and therefore representa- 
tive sample, and permits inferences of a determinable degree of 
certainty. 

C. When the method of sampling assures every individual in the popula- 
tion the same chance of being drawn as any other individual, the 
sample is considered to be random. This sort of sampling is called 
simple random sampling. 

d. A random sample provides an approximation to the population, the 
goodness of the approximation depending upon the size of the sample. 
As the size of the sample is increased, the form and the statistics of the 
sample distribution approach those of the population. When the size 
of the sample is large—say 100 at least—the frequency polygon is 
usually smooth enough to give a good idea concerning the form of the 
population distribution. The distribution of even a quite large sample, 
however, can be expected to diverge from population form to some 
extent owing to the fluctuations of sampling. 

€. Like the sample distribution, sample statistics are subject to chance 
fluctuations. A sample statistic does not ordinarily agree with that 


218 / Statistics in Education and Psychology 


derived from a second sample or with the population parameter. * 
The amount of fluctuation depends on the variability of the popula- 
tion and the size of the sample. ] 

f. Certain statistics tend to fluctuate less than others in successive 
samples. In sampling from a normal population, the mean fluctuates 
less than do other measures of central tendency and therefore provides 
the most reliable estimate of the central tendency of a normal popula- 
tion. Similarly, the standard deviation is the most reliable estimate 
of the variability of a normal population. In other words, the mean 
and standard deviation are more reliable than other similar measures, 
because they tend to approximate more closely the corresponding 
population parameters and hence to fluctuate less from sample to 
sample. It is also true that the product-moment coefficient of correla- 
tion is the most reliable estimate of linear relationship in a normal 
bivariate population. 

g. An inference based upon sample evidence rests upon three points: 
(1) the individuals in the sample are representative of a population of 
individuals; (2) certain facts are observed to characterize the in- 
dividuals of the sample; (3) therefore, probably and approximately, 
the observed facts characterize the individuals of the population. 


THE ROLE OF STATISTICAL INFERENCE 


When exact meanings are given to the words probably and ap- 
proximately, the nature of statistical inference is well delineated in 
paragraph (g), above. These meanings will emerge a little later in our 
discussions of probability, sampling distributions, hypotheses, and 
estimation. Preliminary to these more technical topics, let us consider 
informally the role of inference in research. 

The possibility of making inferences about a population from the 
information provided by a sample is fundamental in research work. 
It is an exciting possibility. In Walker's words (Ref. 48, p. 229): 

The idea that information obtained from a relatively small number 
of cases actually examined can be used to throw light on the character- 
istics of a vast universe which has not been examined is an exciting idea, 

y renders it commonplace. 
€ of some characteristic of 
à measure of the amount of. 
e, is still more remarkable. 
ons regarding populations 
s drawn from the sample. 
or belief about the popula- 


There are two broad types of questi 
Which we attempt to answer by inference 


First, we may ask whether Some hypothesis 
A 


unknown, measure in the Population is called a parameter. 
tion, knowledge about parameters is inferred from statistics 
indicated by Greek letters. 


Statistical Inference / 219 


tion is consistent with the evidence provided by the sample. For ex- 
ample, we find that, say, 60 per cent of the voters in a random sample 
from a specified population are in favor of some proposal; and we ask 
whether the belief that opinion in the population is actually evenly 
divided on the proposal is consistent with this sample finding. This 
amounts to asking whether, in sampling from a population in which 
opinion is evenly divided, it is reasonable to suppose that a 60% : 40% 
sample would arise through sampling fluctuations. As another ex- 
ample, we may ask how reasonable is the belief that the mean number 
of hours in the working week of Pennsylvania high school teachers is as 
low as 40 hours, if the mean in a sample of 250 teachers is found to be 
44 hours. 

As a second kind of question, we may ask within what limits must 
the value of a population parameter lie to be reasonably consistent with 
the value of a corresponding sample statistic. In the first example 
above, this amounts to asking within what limits must the population 
percentage of favorable opinion lie in order to make a sample con- 
taining 60 per cent favorable a reasonable (not improbable) occurrence. 
In the second example, we may ask within what limits must the popu- 
lation mean lie in order to make the sample mean 44 hours a not im- 
probable value. 

It will be seen, as our discussion of sampling and inference pro- 
ceeds, that the two types of questions are closely related. This will be 
brought out in connection with testing hypotheses and statistical estima- 
tion. But first we need to take up the two bases of inference: probability 
and sampling distribution. 


THE MEANING OF PROBABABILITY 


The inferences that are drawn from a sample are always ac- 
companied by a statement indicating the amount or degree of un- 
Certainty in the inferences. These statements typically are made in 
terms of probability. $ 

The word probable, although usually somewhat ambiguous in 
Ordinary discourse, is rather universally used to qualify a proposition 
that is regarded as less than certain, but for which there exists, or is 
believed to exist, some evidence. Such statements as “It appears 
probable that there is a real difference between these groups" and “There 
Probably is some relationship between frustration and neurosis" have 
1n common the idea that something is more likely to be true than false. 
To be useful in statistics, however, the terms probable and probability 
Must be given exact, quantitative meaning. 


220 / Statistics in Education and Psychology 


Although any definition of probability raises stubborn philosophical 
questions (see Ref. 39, p. 242), in sampling theory it is rather generally 
agreed to treat probability as equivalent to relative frequency. Thus, 
to say that the probability of a head on a single toss of a coin is 1/2 is to 
imply that if the coin were tossed over and over again, the relative fre- 
quency of heads would approach 1/2 or .5. If each time the coin were 
tossed one would guess “heads,” he would be right one half of the time 
over the long run. To say that .51 is the probability that a child to be 
born will be a boy is to imply that in the past the relative frequency of 
male births has been found to be .51. To state that .16 is the probability 
of drawing at random from a normal population of measures a single 
measure which deviates from the mean of the population by +I or 
more is to imply that in repeated sampling from the population the 
relative frequency of such deviates would approach .16. 

In general, we may say that the probability of an event is the ob- 
served relative frequency or theoretical relative frequency of the event 
over the long run. It is possible to define probability more rigorously, 
but, however defined, probability as used in statistical inference is 
equivalent to relative frequency. 

Probability figures may be expressed as fractions or as decimal 
numbers, the latter being the more common. They are essentially the 
relative frequencies of chance events. The way in which a probability 
figure is used to indicate quantitatively the degree of uncertainty in a 
statistical inference will be elaborated in later pages. 


THE SAMPLING DISTRIBUTION 


The whole theory of sampling and statistical inference is based 
upon probability and sampling distributions. In most sampling situa- 
tions, there are three distributions of concern: the sample distribution, 
the population distribution, and the sampling distribution. A simple 
illustration will help us to distinguish among the three. 

Suppose that we wish to know the mean number of hours in the 
working week of some 25,000 high school teachers in Pennsylvania and 
that owing to time or money considerations we can examine only a 
random sample of 250 teachers. The sample distribution would be 
merely the frequency distribution of the hours in the Working week of 
the sample of 250. It would be an Observed distribution and one which 
would ordinarily be described by certain of the summarizing statistics 
we discussed in Chapters III and IV. Ordinarily, the sample distribu- 
tion is the only concrete distribution in the sampling situation. 


Statistical Inference / 221 


Now if we had a record of the hours in the working week of all 
of the 25,000 high school teachers in Pennsylvania we could tabulate 
those hours in a frequency distribution. This would be the population 
distribution. Neither the form nor the parameters of the population 
distribution ordinarily are known, but we can draw inferences about 
them from the sample distribution. In our illustration, if the sample 
distribution approximated normality, we might infer that the population 
distribution is normal, ascribing any irregularities of the sample dis- 
tribution to sampling fluctuations. If the mean of the sample distribu- 
tion were, say, 44 hours and the standard deviation 5 hours, we might 
infer that the mean and standard deviation of the population are 
probably and approximately equal to 44 and 5 hours, respectively. 

Now such inferences are susceptible to sampling errors. We do 
not have in our sample distribution absolute form and fixed mean and 
Standard deviation which would be observed again in a second sample 
of 250 from the population. Indeed, if we were to take, say, 100 samples 
of 250 each, we would have 100 different sample distributions, 100 
varying means, and 100 varying standard deviations. If we grouped the 
100 means in a frequency distribution, we would have an experimental 
sampling distribution of the mean in samples of size 250 from our popu- 
lation; if we grouped ihe 100 standard deviations, we would have an 
experimental sampling distribution of the standard deviation. If we 
were to continue sampling until we had drawn all possible samples of 
size 250 from our population of about 25,000, we would arrive at exact 
Sampling distributions. This we cannot do, since the number of possible 
samples would be the astronomical number of combinations of some 
25,000 cases taken 250 at a time. 

Except as a check and clarification of theory, experimental sam- 
pling distributions are of little use. Fortunately, it is possible to de- 
termine by analytic methods the distribution of a sample statistic, such 
as a mean or standard deviation, in all possible samples of a given size 
N, provided the population distribution is of known form. This dis- 
tribution is known as the sampling distribution of the given statistic. 
It is a theoretical or idealized distribution, but may be thought of as the 
distribution which would result if the values of a given statistic were 
computed in all possible samples of a given size actually drawn from 
the specified population. t 

Statisticians have been able to determine the sampling distribution 
of the majority of commonly used statistics in samples drawn from a 
normal population. When the samples are large, many of the statistics 


222 / Statistics in Education and Psychology 


are distributed normally, or nearly so. In small samples, certain sta- 
tistics are distributed symmetrically in what is known as the z distribu- 
tion. Other sampling distributions of importance in statistical work 
include the binomial, the x? (chi square) and the F distributions. In 
later chapters we shall deal with the nature and use of three of these: 
the normal, t, and x? distributions. 

When the sampling distribution of a statistic is known, it is possible 
to determine the relative frequency with which different sample values 
are expected to occur in random sampling from a population in which 
the statistic has some assumed or hypothesized value. Knowing the 
relative frequency, we at once know the probability that a particular 
sample value has resulted from sampling fluctuations, i.e., chance. It 
is this probability figure which enables us to judge the soundness of 
an hypothesis regarding the value of the statistic in the population. 


THE STATISTICAL HYPOTHESIS AND ITS TEST 


In a broad sense, the term Aypothesis refers to a tentative statement 
or proposition that may explain observed facts. The crucial step in 
research is that of testing the hypothesis, and this test is always the 
common sense one of determining whether the hypothesis is consistent 
with the facts. Facts and hypothesis are in reci 
the facts suggest and support the hypothesis; the 
or accounts for the facts. 

The necessity of hypothesis in statistical inference stems out of 
our lack of knowledge about the population. Ordinarily the only in- 
formation we have about population form and parameters is that pro- 
vided by the sample. Because of sampling errors, such information 
cannot be accepted at its face value. It can be used, however, to test 


the reasonableness of the hypotheses we make about population form 
and parameters. 


procal relationship: 
hypothesis explains 


The most useful and successful method yet devised of testing an 
hypothesis is based on the assumption that the hypothesis is true. The 
hypothesis is developed by “if-then” argument. When the “thens” or 
expectations which logically follow if the hypothesis is true are con- 
sistent with observable facts, the hypothesis is considered tenable; 
if not, it is rejected. 

In statistics, an hypothesis which is tested for possible rejection 
under the assumption that it is true is known as a null hypothesis. 
Essentially the null hypothesis assumes that some parameter of the 


Statistical Inference / 223 


population has a certain value, and the hypothesis is tested by de- 
termining whether the corresponding statistic in a sample drawn from 
the population differs so much from the assumed value that the differ- 
ence cannot be reasonably explained in terms of sampling fluctuations 
or chance. If so, the hypothesis is rejected; if not, it is accepted. 

To illustrate this somewhat roundabout procedure, suppose that 
we wish to know whether the mean IQ of a population of workers in a 
certain occupation in a given locality is, say, 100. The null hypothesis 
we wish to test is that the population mean is 100. We randomly draw 
a sample of workers and compute the mean. We then determine whether 
the difference between the sample mean and the hypothesized mean is 
too great to be reasonably ascribed to chance. If so, we reject the 
hypothesis that the population mean is 100; if not, we accept the 
hypothesis. It is important to note that the hypothesis is not disproved 
or proved; it is only shown to be inconsistent or consistent with the 
sample evidence. The procedures or mechanics in testing the null 
hypothesis will be described in later chapters. We shall find them 
relatively simple, once the underlying ideas are understood. 

In a perfectly general sense, the testing of a null hypothesis con- 
sists in determining whether the difference between an assumed or 
hypothesized population value of a parameter and the observed sample 
value can reasonably be ascribed to chance. The determination is made 
in terms of probability—the smaller the probability, the less reasonable 
it is to conclude that the difference is due to chance. Stated another 
way, the smaller the probability, the more reasonable it is to conclude 
that there is a real or nonchance difference between sample and hy- 
pothesis. It follows that to accept a null hypothesis is to conclude that 
the observed difference may be due to chance; to reject a null hypothesis 
is to conclude that the difference is nonchance or real. 

The important question of how small a probability must be before 
a real difference between sample and hypothesis is demonstrated does 
not permit a general or final answer. An hypothesis does not suddenly 
or clearly become untenable or rejectable at some probability figure. 
The best that can be done is to judge the hypothesis in the light of levels 
of significance. When a probability figure, P, is .10—i.e., when an ob- 
Served difference between sample and hypothesis would arise not more 
than 10 per cent of the time in random sampling from a population in 
Which the hypothesis is true—the difference is said to be significant at 
the 10 per cent level; when P is .05, the difference is said to be significant 
at the 5 per cent level; and so on. 


224 / Statistics in Education and Psychology 


It is becoming the fairly common practice to accept a null hy- 
pothesis when the significance level is above about 10 per cent, and to 
reject it when the significance level is about 5 per cent* or below. When 
the level falls between about 10 per cent and 5 per cent, the hypothesis 
is considered in doubt but not clearly rejectable. However, it is always 
open to the investigator to adopt a more or less exacting level before 
concluding that the observed difference is due or not due to chance. 
Moreover, one level of significance may be appropriate in a particular 
Situation, whereas a different level may be appropriate in another. 
This matter will be taken up below in connection with errors in testing 
hypotheses. 

The limits of probability figures are 0 and 1. The nearer a proba- 
bility figure, P, approaches the lower limit O, the less tenable the null 
hypothesis. When P falls in the .10-.90 interval, say, the hypothesis is 
ordinarily considered acceptable. When P falls above -90, the null 
hypothesis is clearly acceptable, but the sampling procedures or arith- 
metic computations are questionable. The nearer P approaches its 
upper limit 1, the more questionable they become. Agreement between 
sample and hypothesis so close as to yield a P of .90 or more arises 
10 per cent or less of the time in random sampling from a population 
in which the hypothesis is true. Such close agreement tends to be open 
to suspicion as “too good to be true.” 

Tests of null hypotheses are commonly called tests of significance. 
In summary statement, the purpose of a test of significance is to de- 
termine the probability that an observed difference between the sample 
value of some statistic and the value assumed by the hypothesis could 
result from the fluctuations of random sampling, ie., chance. The 
outcomes of tests of significance are typically stated in terms of prob- 
ability figures or levels of significance. It should be emphasized that 


ir. Now suppose 
lity of the 4-head 


Statistical Inference / 225 


there is no magic in probability figures or levels of significance. They 
merely summarize the evidence against the null hypothesis. What to 
do in the face of the evidence depends in large part on consideration of 
errors in testing hypotheses. This is our next topic. 


ERRORS IN TESTING HYPOTHESES 


An hypothesis about a population parameter obviously is either 
true or false, but unless we examine the entire population we cannot 
be certain which it is. When only a sample of evidence is available, as 
is generally the case, there is always the possibility of rejecting an hy- 
pothesis which is in fact true or of accepting an hypothesis which is in 
fact false. These possibilities are always present, because sample 
evidence permits only probability statements against or for hypotheses. 
Samples having very small probabilities, if the hypothesis is true, do 
Occasionally occur; on the other hand, samples having large proba- 
bilities do occasionally occur, even though the hypothesis is false. 

The test of a particular hypothesis about a population obviously 
will terminate in one of four results: (1) a true hypothesis will be ac- 
cepted, (2) a false hypothesis will be rejected, (3) a true hypothesis will 
be rejected, or (4) a false hypothesis will be accepted. There is no mis- 
take in the first or second result, and it is the aim of statistical test to 
achieve one or the other, i.e., to accept hypotheses which are in fact 
true and to reject hypotheses which are in fact false. Stated negatively, 
it is the aim of statistical test to avoid rejecting a true hypothesis 
(commonly called an alpha, or Type I, error) and to avoid accepting a 
false hypothesis (bera, or Type II, error). . 

A great deal of statistical theory is concerned with problems of 
Teducing and controlling the dangers of these two types of errors. 

he problems are not simple, because, for fixed sample size, to reduce 
the risk of the first type is to increase the risk of the second. By choosing 
a.01 Probability figure (1 per cent level of significance) instead of a .05, 
One can reduce the risk of rejecting a true hypothesis, and can further 
reduce it by choosing a .001 or 0.1 per cent level. Over the long run, 
Using the 5 per cent level, one will reject not more than 5 true hypotheses 
in 100; using the 1 per cent level, not more than 1 in 100; using the 0.1 
Per cent level, not more than 1 in 1,000. Clearly, one can make the 
Tisk of a Type I error as small as one pleases. However, since hy- 
Potheses not rejected are considered acceptable, any reduction 1n the 
Fisk of rejecting a true hypothesis is inevitably accompanied by an 
Increase in the risk of accepting a false one. 


226 / Statistics in Education and Psychology 


This dilemma can be portrayed graphically. Consider a situation 
in which it is known that the sampling distribution of a statistic, say 
the mean, is normal in form with unit standard deviation and suppose 
the hypothesis that the mean, y (mu), of the population has a certain 
value is being tested. In this situation, if the hypothesis is true, the 
mean of the sampling distribution is equal to u, as shown in Figure 9.1. 
Now if the mean of a random sample of size N differs from the expected 
value u sufficiently to fall in one or the other of the tail portions marked 
off in the figure, the hypothesis can be rejected at the 5 per cent level 
of significance, since the relative frequency or probability of samples 
of size N having means which differ positively or negatively from u by 
as much as the given sample is .05. The two tail portions combined 
constitute what is called the critical région or the region of rejection. 
On the other hand, if the mean of the sample has a value sufficiently 
close to u to fall in the region between the two tail portions or the 
region of acceptance, the hypothesis is acceptable, in the sense that it 
cannot be rejected at the level adopted. 

It will be seen that the proportion of area in the region of rejection 
corresponds to the probability of rejecting an hypothesis which is in 
fact true or the probability of a Type I error. If this region is made 
smaller, the risk of rejecting a true hypothesis is decreased. If this is 
done, however, the region of acceptance necessarily becomes larger. 
When the region of acceptance is increased in Size, a greater discrepancy 
between sample value and expected value is tolerated before the hy- 
pothesis is declared false. As a consequence, the risk of accepting a 
false hypothesis, the Type II error, is increased. 


CONTROLLING THE RISK OF ERROR IN TESTING HYPOTHESES 


In the attempt to control the risk of the Type I error, the usual 
procedure is to limit the risk of rejecting a true hypothesis to some pre- 


Region of 
acceptance 


-1.96 E +1.96 


Fig. 9.1. A two-sided .05 region of rejection in a normal sampling distribution. 


Statistical Inference / 227 


assigned amount—e.g., .10, .05, or .01. The selection of the probability 
figure depends largely upon the nature of the problem. If the conse- 
quences of rejecting an hypothesis which is in fact true are of serious 
concern, a low probability figure, perhaps .01 or .005 or even a smaller 
One, would be selected. If, however, the consequences of rejecting a 
true hypothesis are relatively unimportant, one may wish to reject an 
hypothesis if there is even slight evidence against it. In this case a high 
probability figure, perhaps one as high as .10, may be desirable. 

Consider a situation, for example, in which a new method of 
teaching physics is being tried out in samples of students which can be 
considered representative of the population of future students who will 
Study physics. If the new method would require radical or expensive 
changes in class size, school routine, or equipment, the school officials 
likely would hesitate to reject the hypothesis that the new method is no 
better than the old unless the evidence against it were strong. But if 
the new method were as practicable as the old and no more expensive, 
the hypothesis likely would be rejected on weaker evidence. 

In general, the more serious the consequences of rejecting a true 
hypothesis, the lower the probability figure or the smaller the region of 
rejection one would select. 

. After a probability figure has been selected to limit the risk of Te- 
Jecting a true hypothesis to some desired amount, the risk of accepting 
a false hypothesis (Type II error) must be considered. As we have seen, 
the risk of the latter is inevitably increased when the risk of the former 
is decreased. The risk of accepting a false hypothesis is further affected 
by the relation of the actual value of the population parameter to its 
hypothesized value. If the hypothesis is in fact false, the actual value of 
the parameter obviously must be either less than or greater than the 
Value stated by the hypothesis. The way in which these two alternatives 
Influence procedures in testing hypotheses is best seen in reference to 
Possible regions of rejection and acceptance in a sampling distribution 
used in testing hypotheses about, say, a population mean. One ar- 
rangement of such regions is shown in Figure 9.1. It is shown again 
With two other possible arrangements in Figure 9.2. (Smaller or larger 
regions of rejection could of course be chosen.) The risk of rejecting 
an hypothesis concerning the value of u which is in fact true is .05 for 
each of the three arrangements, but the risk of accepting a false hy- 
Pothesis is partly determined by the relation of the actual value of y 
to its hypothesized value. If the hypothesis is false because p 18 actually 
less than its hypothesized value (see Figure 9.3), in successive sampling 
à greater number of sample means will fall in the (a) region of rejection 


228 / Statistics in Education and Psychology 


Region of 
acceptance 


Region of 
acceptance 


Region of 
acceptance 


0.025 


0.05 «0.05 
-1.64 E -1.96 E +1.96 T +1.64 
(a) (b) (c) 


Fig. 9.2. Three arrangements of regions of rejection in a normal sampling dis- 
tribution which limit the risk of rejecting a true hypothesis to .05. 


is actually greater than its hypothesized value, the (c) region is best, 
in the sense that it reduces the risk of accepting the false hypothesis to 
a minimum. Neither the (a) nor the (c) region, however, is a good 
safeguard against the opposite alternative. The (b) region effects a 


AS 
E 


| 
£ 

] 
| 


® d, = 
(a) (b) (c) 


Fig. 9.3. Assumed ( ) and true E-->) sampling distributions in case y is 
in fact less than its hypothesized value Hn. In this case the probability of a 
sample mean falling in a region of rejection is greatest for (a). 


" 
ES 
Hu 


Statistical Inference / 229 


null hypothesis and H,, Hs, and H to stand for the alternative hy- 
potheses, we may pair hypotheses 


Ho:p=K Ho:u = K Ho:u = K 
Hip < K H; u zK Hs tu » K. 


Now if we wish to have maximum protection against accepting Ho 
if it is false because H, is true, we should use the left-hand or (a) region 
of Figure 9.2. On the other hand, if we wish to have maximum protec- 
tion against accepting Ho if it is false because #s is true, we should 
use the right-hand or (c) region. However, if we wish to have maximum 
protection against accepting Ho if it is false because either H, or Hs 
is true, we should use the two-sided, or (6) region, since that region 
Protects against both alternatives. 

Technically, H, and Hs are known as one-sided alternatives, and 
the test of Ho against either is known as a one-sided, or one-tail, test. 
A» is known as a two-sided alternative, and the test of Ho against it is 
known as a two-sided, or two-tail, test. 

There has been a good deal of controversy over the use of one- 
Sided tests in psychological and educational research. The controversy 
is too extensive and technical for consideration here. We shall remark 
Only that two-sided tests are usually appropriate. (See Refs. 44 and 51 
for more about the rationale of one-sided and two-sided tests.) 

__ Let us note again that, in all three arrangements of Figure 9.2, Bs 
risk of rejecting a true hypothesis (Type I error) is the same. This ris 
1S controlled by the size of the region of rejection, not its location. The 
Nature of the problem ordinarily suggests both the appropriate size 
And the appropriate location of the region of rejection. In the E 
following, these general ideas will be applied to real sampling problems. 

Both the size of the sample and the statistic employed in ae 
-YPotheses about a population have bearing upon the risk of € 
inferences, We shall later see that the variability or spread of a sampling 
distribution is decreased if the size of the sample is increased. In other 
Words, the larger the sample, the less a sample statistic varies in d 
Sive sampling. It follows that as sample size is increased, greater pre 
Slon in testin hypotheses is possible. : 

Different mes may A employed in testing hyp aa he 
example, the median as well as the mean may be used in e aie 
hypothesis regarding the point of central tendency of a ud 
tion. However, the sampling distribution of the median is cha 


230 / Statistics in Education and Psychology 


by greater variation or spread than that of the mean. The use of the 
mean in preference to the median is, therefore, equivalent to employing 
a larger sample. A statistic which shows less variation in successive 
samples than the other statistics of its class is said to be efficient. 

The probability that a statistical test will reject the null hypothesis 
when an alternative hypothesis is true is called the power of the test. 
In other words, the power of a test is the probability of not making a 
Type II error. We shall have occasion in later chapters to compare 
several statistical tests in terms of their power. 


ESTIMATION OF POPULATION PARAMETERS 


It is sometimes the case that neither past experience nor the nature 
of the problem suggests a particular hypothesis to be tested. Even when 
a particular hypothesis is suggested, the statistician may wish to de- 
termine what hypotheses in general are acceptable and what are not, 
or to determine the single hypothesis which is best supported by a 
sample in hand. In these situations, the sample statistics are used to 
estimate values of the corresponding population parameters, 

There are two problems in statistical estimation. One of these is 
concerned with finding a single value which can be considered the “best” 
estimate of a parameter that can be made from a given sample statistic. 


It is generally true that the 


ter is either the value of the 
corresponding sample statistic or a value readily obtained from it. For 


ation, independent of the mean, 


is the standard deviation of the sample multiplied by y/ N/(N — 1). 
Point. estimates, although of great interest and importance in 


statistical theory, are not very meaningful in the practical situation 


Statistical Inference / 231 


terval is designated as the 90 per cent confidence interval, and its end- 
points are designated the 90 per cent confidence limits. Other intervals 
and limits are similarly described. 

As a rule, given the value of a sample statistic, the lower limit of, 
say, the 90 per cent confidence interval for the corresponding param- 
eter is found by determining a value of the parameter such that the 
probability of a sample value equal to or greater than the one observed 
is .05. The upper limit is found by determining a value of the param- 
eter such that the probability of a sample value equal to or less than 
the one observed is .05. Over the long run, intervals thus determined 
must include the parameter 90 per cent of the time. It follows, of course, 
that any hypothesis proposing a value of the parameter not included 
by the 90 per cent confidence interval can be rejected at least at the 
10 per cent level of significance, while one proposing a value not in- 
cluded by the 95 per cent confidence interval can be rejected at least 
at the 5 per cent level of significance, and so on. . 

. Ordinarily, either a 90 per cent or a 95 per cent confidence interval 
IS determined. Intervals may be determined which allow other than 
90 or 95 per cent confidence in the statement, "The parameter lies 
Somewhere in this interval" The disadvantage of high confidence 
intervals is that, for fixed sample size, such intervals are relatively wide 
and hence lacking in precision. In practice, the statistician ordinarily 
'S willing to sacrifice some confidence in order to obtain a closer esti- 
Mate, For this reason, the 95 per cent interval is the one most fre- 
quently employed, Much depends, of course, upon the nature of the 
Problem, If it is extremely important that the interval cover the value 
9f the parameter, a high confidence interval, perhaps one as high as 
7 per cent, would be determined. " 

. As pointed out previously, the variability or spread of the sampling 
distribution decreases as the size of the sample increases. As a result, 
Any particular confidence interval is narrowed as sample size increases. 

n other words, the larger a sample the more precise the estimate it 


affords, at no expense to confidence. 
SIGNIFICANCE AND RELIABILITY OF SAMPLE STATISTICS 


,, In practical research it is desirable to distinguish between the "d 
nificance and the reliability of sample statistics. As used in ipe: $ 
Š term significance is limited to mean the probable existence o 7 
Nonchance difference between observation and expectation or between 
Sample and hypothesis. Tests of significance relate to the single ques- 


232 / Statistics in Education and Psychology 


tion: Is it reasonable to suppose that an observed difference has occurred 
as a result of random fluctuations of sampling? The question typically 
is answered by testing the null hypothesis. A difference is significant 
or not significant depending upon whether it cannot be or can be rea- 
sonably attributed to sampling fluctuations or chance. The probability 
that an observed difference could have arisen owing to chance gives 
objective meaning to the term significance. 

The reliability of a statistic, such as a mean or a measure of rela- 
tionship, depends upon the extent to which the statistic can be expected 
to fluctuate in successive similar samples from the same population. 
The less the expected fluctuation, the more reliable the sample statistic 
is as an estimate of the corresponding population parameter. The 
reliability of a statistic usually is described in terms of confidence 
intervals. 

Unless it has statistical significance, of course, sample information 
is not reliable, since, for all we can tell, chance may have operated to 
produce it. Sample information, however, can be statistically signifi- 
cant without possessing sufficient reliability to permit serviceable esti- 
mates or useful prediction. This will be emphasized later in connection 
with multiple correlation and regression (p. 248). Small sample informa- 
tion, in particular, usually is of low reliability. There is no exception 


to the rule that, other things being equal, the larger the sample the 
more reliable the information it provides, 


CONCLUDING REMARKS 


Statistical inference is concerned either with testing hypotheses or 
with estimation. In order to use a sample statistic in testing an hy- 
pothesis about the value of the corresponding population parameter 
or in estimating the value of the parameter, we must first of all know 
the sampling distribution of the statistic. The general ideas underlying 
the use of the sampling distribution have been discussed in this section; 
the application of these ideas to the normal, the z, and the chi-square 
sampling distributions will be illustrated in the following chapters. 

Before leaving this chapter, however, we need to emphasize the 
necessity of sound data in inference. There are four plausible explana- 
tions for an observed difference between sample and hypothesis: 
(1) bias in sampling or in observing, (2) variable errors of measurement, 
(3) random sampling fluctuations or chance, and (4) difference between 
population and hypothesis. The explanations are not mutually exclu- 
Sive. A test of significance is dependable only if the original observa- 


Statistical Inference / 233 


tions are unbiased and reliable. The test indicates only whether it is 
more reasonable to conclude that the population actually differs from 
the hypothesis than that the difference is due to chance. There is 
nothing in a test of significance that purifies or improves faulty data. 


EXERCISES 


1. Below are several important words and phrases in statistical inference. 
Read, think, and reread about them until you have a working definition of 
each. 


alternative hypothesis region of acceptance 
interval estimate region of rejection 
level of significance reliability 

null hypothesis sample 

parameter sampling distribution 
point estimate sampling error 
population statistic 

power of a test test of significance 
probability Type I error 
random sampling Type II error 


2. A social worker interviewed all of the students who had dropped out of 
school during a certain year, hoping thereby to determine what condi- 
tions needed to be corrected in order to decrease the number of dropouts. 
He claimed that, since he interviewed all dropouts, he had no sampling 
problem. Do you agree? If not, what is the population of concern? 
Under what conditions can the worker make inferences about this 
population? 

3. An investigator determined the mean IQ in a random sample of 100 
twelve-year old children of foreign-born parents in a certain city. Identify 
the population, the sample, and the sampling distribution in this situation. 

4. Suppose you were asked to determine experimentally the sampling 
distribution in (3) above. How would you proceed? i I 

5. What are the two broad problems with which statistical inference is con- 
cerned? In what way are these problems related ? ; a 

6. In order to be certain regarding the form of a population distribution or 
the values of its parameters, what would one have to do? A 

7. In terms of relative frequency, what does it mean to say (a) t a e 
probability of drawing a red ball from an urn Is 4 (b) ir mm ee 
Single die the probability of a 5-spot is 1/6, (c) that the probability 
man of given age will live to be 63 is 3? 

8. According to its classical definition, probability is 
of “favorable” cases to the total number of equall 


Way is the definition circular? 


the ratio of the number 
y likely cases. In what 


234 / Statistics in Education and Psychology 


9. 


10. 


11. 


12, 


13. 


The distribution of worker accidents in an industrial plant during a 
certain period was 


NUMBER OF ACCIDENTS NUMBER OF 
PER WORKER WORKERS 

0 890 

1 75 

2 24 

3 6 

4 3 

5 0 

6 2 

1,000 


(a) What is the relative frequency of workers who had more than 1 acci- 
dent during the period? (b) What is the probability that a worker picked 
at random will have had more than 1 accident during the period? 

If a single score is drawn at random from a normal distribution of scores, 
what is the probability that it will fall in the interval M + 10? 

Suppose it is known that the sampling distribution of the mean of samples 
of size N from a given population is normal with a mean of 10 and a 
standard deviation of 1. What is the relative frequency or probability of 
samples having a mean of 11 or more? A mean of 9 or less? A mean 
between 9 and 11? 

Consider the Type I and Type II errors of inference. With which type is 
level of significance associated? With which type power of the test? 

The three most widely used sets of rejection and acceptance are shown in 
Figure 9.2. Various other sets are of course possible. For example, one 
might use a 5 per cent region of rejection comprising 1 per cent of the 


area in the left tail and 4 per cent in the right tail. When might such a set 
be appropriate? 


CHAPTER X 


The Normal Sampling Distribution 


When a population is normal in form, a great many sample sta- 
tistics are distributed normally or nearly so; in fact, as sample size 
increases the sampling distributions of most of the commonly used 
statistics approach normality. 
. When it can be assumed that the sampling distribution of a sta- 
tistic is normal and when the standard deviation of the distribution 
can be determined, inferences about the corresponding population 
parameter can readily be made by use of a table of normal areas. The 
area under the curve corresponding to a specified interval on the base 
line indicates the relative frequency or probability of sample values 
falling in the interval, and hence may be used in testing hypotheses 
àbout, and determining confidence intervals for, population parameters. 
We shall find little that is new in this application of the normal 
Curve. In making inferences about predicted scores and about true 
Scores, we used normal curve relationships in this way. The pro- 
Cedures are readily extended to drawing inferences about parameters 


from single samples and from two samples. 
235 


236 / Statistics in Education and Psychology 


Inferences from Single Samples 


Single-sample problems are common in statistical work. In 
standardizing a new intelligence test, for example, the test constructor 
may wish to know whether the mean IQ of a sample is significantly 
different from 100. In a study of opinion about some issue, the poll- 
ster may wish to know whether a sample proportion is significantly 
different from .50. In studying relationships, the investigator may 
wish to know whether a sample correlation coefficient is significantly 
different from zero. Or, it may be desired to determine confidence 
intervals for a population mean, proportion, or correlation coefficient. 
We shall consider such problems in the following pages. 


THE SAMPLING DISTRIBUTION OF THE MEAN 


It can be shown mathematically that the means of all possible 
samples of size N from a normal population are distributed normally, 
and that the mean of the distribution (the mean of means) is equal to 
the mean u of the population and the standard deviation equal to the 
standard deviation c of the population divided by the square root of N. 

The demonstration is beyond the scope of this book, but we can 
check and clarify the statements experimentally. The distribution 
of the means of 120 samples of 25 each are shown in Table 10.1. 
The samples were drawn at random from the normal population of 


TABLE 10.1 
Mean Values of 120 Samples 


of 25 from a Normal Popu- 
lation in which u = 40.0 and 


o = 10.0 
MEAN OF SAMPLE FREQUENCY 
45.5- l 
44.5- 0 
43.5- 3 
42.5- 7 
41.5- 14 
40.5- 22 
39.5- 25 
38.5 18 
37.5- 16 
36.5- 8 
35.5- 3 


The Normal Sampling Distribution / 237 


25 


20 


Frequency 
a 


3 


3495 3695 3895 4095 4295 4495 

Mean of sample 
Fig. 10.1. Histogram of distribution of 120 sample means and superimposed 
normal curve. (From Table 10.1.) 


400 scores, Table B, Appendix, in which u = 40.0 and g = 10.0. 
The distribution of means approximates normality closely (see Figure 
10.1) with mean of 39.85 and standard deviation of 2.04. These values 
àre in good agreement with the theoretically correct values 40.00 and 
10.0/,/25 or 2.00, respectively; in fact, the agreement is unusually 
800d, considering that our experimental distribution comprises only 
120 of the many possible samples of 25 each from this population 
of 400. 

Many such experiments have been made 
theoretically exact conclusion stated above, 
Population is normal in form with mean p and standard deviation c, 
the sampling distribution of the mean of samples of size N is normal 
With mean y and standard deviation o/VN. The latter quantity 15 
generally known as the standard error of the mean and is designated 


o s " . 
M Oryx. Using this notation we have 
om = c/N/ N. 


The relationship expressed in (10.1) obtains exac 
“viation, ø, of the population is known. It is exact enough to be 


useful in inference when o is unknown, provided the sample is not 
Small, say not less than about 30. When N is about 30 or more, the 
Standard deviation s of the sample may be substituted for ø in formula 


and they support the 
namely, that when a 


(10.1) 


exactly when the standard 


238 / Statistics in Education and Psychology 


(10.1) without introducing serious error. Making the substitution, 
we have 


su = s/VN — 1, (10.2) 


which gives the standard error of the mean in terms of sample statistics. 

The reason that we divide by N — 1 instead of N in (10.2) is be- 
cause the sample standard deviation tends to underestimate the popu- 
lation standard deviation. (See p. 72.) We get a better approximation 
to om by dividing by N — 1 in (10.2). 

Since the standard error of the mean can be approximated from 
sample statistics and since the sampling distribution of the mean 
tends to be normal in form, it follows that normal curve relationships 
can be used in making inferences about a population mean, provided 
the sample is about 30 or more in size. 


TESTING HYPOTHESES ABOUT POPULATION MEANS 


The general procedures in testing hypotheses and controlling 
risks of the Type I and Type II inferential errors are readily applied 
to the sampling distribution of the mean. The three sets of .05 regions 
of rejection in Figure 9.2 are, for the sampling distribution of the 
mean, determined by the points at — 1.64537, +1.9651, and +1.64syr, 
as shown in Figure 10.2. The .10 regions would be determined by the 


points at —1.28sy7, + 1.64syr, and +1.285,1, respectively. The points 


determining other sets of regions of rejection may readily be deter- 
mined from the table of normal areas. The reasons for selecting a 
region of rejection of particular size and location were discussed in 
pp. 226-229. 

To test an hypothesis about the mean 1 Of a population, we need 
only to determine whether the difference between the mean of a 
sample and the mean proposed by the hypothesis is, in standard units, 
sufficiently large to fall in the particular region of rejection selected. 


Region of 
acceptance 


Region of 
acceptance 


Region of 
acceptance 


-1.64 su £ -196s, ij +l96s, Ç 64s. 
(a) (b) (c) 


Fig. 10.2. Three .05 regions of rejection in the sampling distribution of the 
mean. 


The Normal Sampling Distribution / 239 


: we use z* to designate a difference or deviation from the mean of a 
ormal sampling distribution expressed in standard units, we have 
*| _ M. — ñ M—u 
(10.3) 


z B 


su s/VN=1 


tha M is the mean of the sample, u is the proposed or hypothe- 
es E of the population, and Sar is the standard error of the mean. 
e crucial statistic for testing an hypothesis about the mean. 
The procedures in testing an hypothesis about a population mean 
are illustrated in the following example: i 


EXAMPLE. The mean of the 138 VAT scores of Table A, Appendix 
is 550.37 and the standard deviation is 92.73. Test the Bypothesis 
at the 5 per cent level that the population represented by this sample 
has a mean of 560. 

We wish to test the null hypothesis Ho: = 560. The ap- 
propriate alternative hypothesis is Ha: 7 560, since we have no 
reason to be more concerned about the “less than" than the “greater 
than" alternative. To reject Ho, we need a |z| of 1.96 or more. 


(Why?) By formula (10.3), we have 


> = _ 550.37 — 560. em 192 
92.73/A/138 — 1 Er 


at the 5 per cent level; in fact, the probability 


We cannot reject Ho 
t, is twice the .11 we 


P corresponding to — 1.22, for a two-sided tes 
read in Table C, Appendix, or .22. 

above test, a statement something 
he hypothesis that the population 
per cent level, since P from the 
Its of tests of significance, it is 
e decision not to reject 


like In reporting the results of the 
one this would be appropriate: aTi 
E € is 560 cannot be rejected at the 5 

is .22.” In writing about the resu 
Benerally a good idea to report P as well as th 
9r to reject Ho. 


E 
STIMATION OF A POPULATION MEAN 
“best” estimate of a population 


The maximum likelihood or 
sample is the mean M of the 


me ñ 
an u that can be made from a single 
== AMOUR 
* p . 
The symbol CR (critical ratio) is used sometimes, instead of z, to designate 
ling distribution expressed in standard 


a deviar: 
Units lation from the mean of a normal samp 


240 / Statistics in Education and Psychology 


sample. It is usually desirable, however, to know the extent to which 
the point estimate may be in error. Hence, in applied statistics, a con- 
fidence interval for the population mean, rather than a point estimate, 
is usually determined. 

To determine the symmetrical 95 per cent confidence interval, we 
proceed as follows. We locate the lower limit M of the interval such 
that the probability of a sample mean equal to or greater than the 
observed sample mean M is .025. By the table of normal areas, this 
limit is 1.9655; i.e., .025 of the area under the curve lies beyond 1.96 
standard units. Hence the lower 95 per cent confidence limit M, for 
the population mean is M — 1.96sy. (See Figure 10.3.) 

This is the same result as would be obtained if M were considered 
to be the mean of the sampling distribution and My, located at a 
distance — 1.96s,; below M. But if this were done, the reasoning would 
be incorrect. If u is in fact equal to Mr, Mr 


is the mean of the sampling 
distribution, as indicated in the figure. The hypothesized or assumed 


value of u must always be thought of as the mean of the sampling 
distribution. Otherwise the implication would be that y is a variable, 
which would of course be nonsense. The population mean can have 
one and only one value. 

By similar procedure and reasoning the upper limit My for the 
population mean is located at M + 1.96sy. 


y We may express the 
procedure in the formulas 


Mr = M — 1.96syr, 
Mu = M+ 1.96syr, MAD 


which give the 95 per cent limits for H in terms of the observed sample 


0.025 — 0.025 
7277 


M, 19655 M M 1965, My 


Fig. 10.3. The lower and upper limits of the 95 per cent confidence interval 
Jor the population mean are 1.96s below and above the sample mean. 


The Normal Sampling Distribution / 241 


mean M. The interval, M + 1.96s,;, is the 95 per cent confidence 
interval for u. We can be confident that, 95 per cent of the time, in- 
tervals so constructed will include the parameter. 


EXAMPLE. The mean of the 138 VAT scores, Table A, Appendix, 
is 550.37 and the standard deviation is 92.73. Determine the 95 
per cent confidence interval for the population mean. 

The sample mean M is 550.37 and sar is 92.73/4/137 or 7.92. 
Substituting these values in formulas (10.4) we obtain Mz — 534.8 
and My = 565.9. The 95 per cent confidence interval for the 
population mean u thus is 534.8 — 565.9. 


The 90 per cent confidence limits for y are located at M + 1.64sar3 
the 99 per cent limits at M + 2.58sy; and so on, as can readily be 
deduced from a table of normal areas. 

Confidence intervals for a parameter are, as a rule, more in- 
formative than the test of a particular hypothesis about the parameter. 
(See Ref. 33.) They serve to separate admissible or tenable hypotheses 
from untenable. In the example above, any hypothesis proposing a 
value of j which falls in the interval 534.8 to 565.9 is tenable in the 
sense that it cannot be rejected at the 5 per cent level; any hypothesis 
Proposing a value outside the interval is untenable in the sense that it 
can be rejected at the 5 per cent level. It will be seen that a confidence 
interval corresponds to the region of acceptance in Diagram (5) of 


Figure 9.2 


INFERENCES ABOUT POPULATION PROPORTIONS 


Consider a population such that each member has or does not 
have a given character. Such populations, known as twofold or bi- 
nomial populations, are common in research. As examples, we may 
Cite voters who are for or not for a certain proposal or a certain polit- 
ical candidate; youth who are brought or not brought before a juvenile 
court: individuals who are Protestant or not Protestant; children who 
are left-handed or not left-handed; manufactured articles which are 
Or are not defective; hospital patients who respond or do not respond 
to a given treatment; and subjects who answer “yes” or answer “no” 
to a given question. In twofold populations, every member is charac- 
terized by the presence or absence of the characteristic under con- 
Sideration. 

The problem in investigating twofold popu 
drawing inferences about the proportion of individuals who possess 


pulations is that of 


242 / Statistics in Education and Psychology 


the given character. Let us represent this population proportion or 
parameter by ¢. If we draw random samples of a given size N from a 
twofold population, the sample proportions are distributed in what is 
known as a binomial sampling distribution. (See Ref. 44.) Now if 
Né or N(1 — ¢), whichever is smaller, is about 5 or more, the bi- 
nomial distribution is nearly normal in shape with mean ¢ and standard 
deviation 4/$(1 — ¢)/N. Hence, hypotheses about a population 
proportion, ¢, may be tested by the z statistic, 


1 
Ip — $| — 55 


z= ——, (10.5) 
Val = 6)/N 


where p is the sample proportion, ¢ the hypothesized proportion, and 
N the size of the sample. The term 1/2N is needed to improve the fit 
of the continuous normal curve to the discrete binomial distribution. 
Notice that the term is subtracted from the absolute value of the 
difference p — ¢. 

The procedures in testing a hypothesis about a population pro- 
portion are similar to those in testing a hypothesis about a population 
mean. Consider the following example: 


EXAMPLE. A new tranquilizing drug is put to trial in a sample of 
50 emotionally disturbed subjects. Thirty-eight of the 50, or 76 per 
cent, responded favorably. Is this percentage significantly higher 
than the 60 per cent known to respond favorably to a competing 
drug? 

To answer the question, we need to test the null hypothesis 
Ho: $ = .60 against the one-sided alternative hypothesis Hat > 
-60; i.e., we need to use a one-sided test. The one-sided test will 
give maximum protection against accepting Ho if à is greater than 
.60, the alternative of first concern. 

Since Nó equals 30 and N(1 — ¢) equals 20—both products 
much larger than the “about 5” needed to insure that the sampling 
distribution is sensibly normal—the z test of formula (10.5) is 


clearly appropriate. We have p = .16, $ = .60, and N = 50, so 
that 


1 
6 = Mes. 
e so 2650) .15 


ES AER II =D 
V.60(1 — .60)/50 -06928 


The Normal Sampling Distribution / 243 


Turning to Table C, Appendix, we find that the probability figure 
corresponding to 2.17 is .0150. We may reject H at the 1.5 per cent 
level and conclude that $ is greater than .60. It is highly probable 
that more than 60 per cent of similar subjects would respond favorably 
to the new drug. 

As noted earlier in connection with inferences about population 
means, confidence intervals for a parameter are generall more in- 
formative than tests of null hypotheses, unless a definite probability 
figure is desired. This is particularly true for population proportions. 

If we replace $ in the denominator of (10.5) with the sample 
proportion, p, solve for é, and consider z as either negative or positive, 
we will obtain formulas for the approximate upper and lower con- 
fidence limits, p; and py, for ¢ in terms of the sample proportion. 
The formulas turn out to be 


NEM RE PLD 
PA SB” ^ N 


1 I — p) 


The 90 per cent limits are obtained by taking z as 1.64; the 95 per cent 
limits by taking z as 1.96; and so on. 

Formulas (10.6) give satisfactory results unless p is extreme or 
the sample small. When p is between about .25 and .75 and N is about 
20 or more, the limits are accurate enough for practical purposes. 
When the conditions are not met, exact binomial tables (Ref. 1) or 
Charts based on such tables (Ref. 44) should be used. 

To use the formulas, we need only the sample proportion and 
Sample size. Let us find the 95 per cent confidence limits for $ in the 
drug example above. In that example, p = .76, N = 50, and, since 
We want the 95 per cent limits, z — 1.96. Substituting, we have 


(10.6) 


pr = .76 — 1/2(50) — 1.96 /.76(1 — .76)/50 


I 


II 


pu = .76 + 1/2(50) + 1.96/.76(1 — .76)/50, 

from which we get p, = .63 and py = .89. The interval .63 — 89 
Is the 95 per cent confidence interval for ¢. Any hypothesis proposing 
a value of é outside the interval can be rejected at the 5 per cent level 
On a two-sided test, or at the 23 per cent level on a one-sided test, 


244 / Statistics in Education and Psychology 


since in using a one-sided test we are concerned with only the less 
than (or greater than) alternative. (See p. 229.) 


INFERENCES ABOUT POPULATION PRODUCT-MOMENT 
COEFFICIENTS OF CORRELATION 


In sampling from a normal bivariate population, the form of the 
sampling distribution of the product-moment coefficient of correlation 
is independent neither of the population parameter pz, (rho) nor the 
size N of the sample. If the absolute value of p,, is large or if N is 
small, the distribution is markedly nonnormal. 

Fisher (Ref. 11) has shown that a simple logarithmic transforma- 
tion of the product-moment coefficient of correlation is distributed 
normally, to a very close approximation, regardless of the size of the 


sample or the value of pzy. Fisher's transformation, which we shall 
designate the z, transformation, is 


z = 1:1513logi0 I ES E (10.7) 
zy 


in which rzy is the sample coefficient. The mean of the z, sampling 


distribution is the population parameter z, and the standard deviation 
is 


s, = ——-: (10.8) 


As a check and clarification of theory, 92 random samples of 25 
each were taken from a normal bivariate population in which pry 
was .85. The distribution of the 92 sample r,,’s is shown in the left- 
hand half of Table 10.2 and in Figure 10.4. It will be noted that the 
distribution is markedly skewed to the left. This would be expected. 
Recalling that the limits of the correlation coefficient are —1 and +1, 
we see that the sample coefficients cannot exceed the population value 
-85 by more than .15, but can fall below .85 by as much as 1.85. This 
situation of course invites skewness. 

The 92 sample r,,’s were transformed to z, equivalents by use of 
Table E, Appendix. (If such a table were not available, the trans- 
formation could be made by use of formula [10.7].) The distribution 
of the 92 z, equivalents is shown in the right-hand half of Table 10.2 
and in Figure 10.5. The mean of the distribution of z, is 1.26, in agree- 
ment to three-figure accuracy with the population value, and the 


The Normal Sampling Distribution / 245 


standard deviation is .25, in fair agreement with the value .21 obtained 
from formula (10.8). 

. Regardless of size of sample and the value of pzy, the sampling 
distribution of z, is very nearly normal in form, with mean z, and 
standard deviation 1/\/N — 3. Hence, we may confidently utilize 
the normal probability table in making inferences from a product- 
moment correlation coefficient, provided we transform observed and 


TABLE 10.2 


Distribution of r,, and z, Equivalents in 
92 Samples of 25 Each from a Population 
in which p,, = 85 


VALUE OF ry FREQUENCY VALUE OF zr FREQUENCY 
.93-.97 7 1.90-1.99 1 
.88-.92 22 1.80-1.89 2 
.83-.87 32 1.70-1.79 1 
.18-.82 16 1.60-1.69 3 
23-71 8 1.50-1.59 T 
.68-.72 3 1.40-1.49 9 
.63-.67 1 1.30-1.39 15 
.58-.62 2 1.20-1.29 20 
.53-.57 0 1.10-1.19 13 
-48-.52 1 1.00-1.09 11 
.90- .99 4 
.80- .89 2 
-10- .79 3 
.60- .69 0 
.50- .59 1 

32r 

28| 

24 


Frequency 
= 


0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 
Value of rxy 


2 random samples. (From Table 10.2.) 


0.50 0.55 


Fig. 10.4. Distribution of r«, in 9 


246 / Statistics in Education and Psychology 


20r 


a 
T 


Frequency 
a 
T 


1 CENE 1 L Tn 
0.545 0.745 0.945 1.145 1.345 1.545 1.745 1945 
Value of z, 


Fig. 10.5. Distribution of z, in 92 random samples. (From Table 10.2.) 


hypothesized values to their Z, equivalents. The procedures parallel 
those described in connection with the mean and the proportion. 


EXAMPLE l. The coefficient of correlation between the VAT scores 
and first semester averages in the sample of 138 college freshmen, 
Table A, Appendix, is .46. At wha 
hypothesis that, in the population represented by this sample, the 
coefficient is .60? 

We shall test the null hypothesis Ho: pry = .60 against the 
two-sided alternative H4 “Pay % .60. The two-sided test is appro- 
priate because we have no more reason to worry about accepting 
Ho if pz, is less than .60 than about accepting H, if pry is greater 
than .60. 

According to Table E, the z, values of the observed and hy- 
pothesized coefficients are .50 and .69, respectively. By formula 
(10.8), sz, = 1/138 —3 = 086. Hence, z = (.50 — .69)/.086 
or —2.21. Turning to Table C, Appendix, we find the corresponding 
P to be .0136. Doubling P for the two-sided test, we get .027. Ho 
can be rejected at about the 3 per cent level. We conclude that it is 
quite unlikely that Pry is .60 in the population. 


EXAMPLE 2. What are the 95 
example (1) above? 


Since the sample z, is .50 and Sz, is .086, we h 
the procedures of formulas (10.4), 


per cent confidence limits for p,, in 


ave, paralleling 


Zn = .50 — 1.96(.086) = 33 
Zu = .50 + 1.96(.086) = 67. 


The Normal Sampling Distribution / 247 


Turning these limits back to rzs, we get .32 and .58, respectively, 
as the lower and upper 95 per cent confidence limits for py. 


The z, transformation is applicable to partial correlation coef- 
ficients, since these are essentially product-moment coefficients. The 
standard error of z, corresponding to r;».3 is included in Table 10.3. 
Using this standard error, we may make inferences about the param- 
eter P12.3 in exactly the same way that we make inferences about pzy. 

Before leaving the z, transformation, let us emphasize the fact 
that it is applicable only to the product-moment coefficient of cor- 
relation for continuous, normally distributed data. It cannot be 
applied to the various other coefficients described in Chapter VII, and 
hence these coefficients do not permit exact inferences. This fact 
marks another advantage rz, has over other measures of relationship. 


STANDARD ERRORS OF COMMONLY USED STATISTICS 


The standard deviation of the sampling distribution of a statistic 
is generally known as the standard error of the statistic. The standard 
errors of various statistics whose sampling distributions approach 
normality when N is about 30 or more are included in Table 10.3. 

The procedures in using these standard errors to test hypotheses 
and to set up confidence intervals for parameters are similar to those 
described in connection with the mean, proportion, and correlation 
Coefficient. An example will serve to illustrate the procedures again. 


EXAMPLE. In a three-variable correlation problem involving 46 sub- 
jects, the following statistics were observed: b12.3 = .31, 613.2 = .40, 
r23 = .80, and R¡.93 = .65. Are the partial regression coefficients 
Significant at the 5 per cent level? 


When we ask whether a correlation or regression coefficient is 
significant we mean is it significantly different from 0. The hy- 
potheses to be tested are Ho : 812.3 = Oand Ho: 813.2 = Oagainst 
the two-sided alternative that the 8' are not equal to 0. By the 
formula given in Table 10.3, the standard error of both coefficients 
is J/(I = .652)/43(1 — .802) or .19. Hence, the z values are 


| 31-0 
19 


40-0 _ 
z= EG = 2 


N 


According to Table C, Appendix, the corresponding P's are .0516 


248 / Statistics in Education and Psychology 


TABLE 10.3 


Standard Errors of Various Statistics whose 
Sampling Distributions Approximate Normality 


STATISTIC STANDARD ERROR 
S 
Arithmetic mean, M Sí = 
N-1 
1.25s 
Median, Mdn Suan = = 
VN=1 
ç 2 
Quartiles, Q, and Q; SQ, = sq, = s 
Standard deviation, s Ss = VA = Tsar 
Average or mean devia- SAD = 164D 
tion, AD VN — 1 
š n 1.20 
Quartile deviation, Q sq = — 
VN-— 1 
Decile deviation, D sp = 2D 
N —= 1 
z transformation of Si = l 
Payo Zr VN —3 
z transformation of S - "e 2 
z, 712.8 
112.3) Zr. VN-4 
5, — s 
Regression coefficient, ON 4 v1 EN 
byz Se /N — 2 
Partial regression co- 
efficients, 


b15.3, and bis.2 


and .0174, Doubling the P's for the two 
-035 and conclude that b 
but that 5,3.» is. 


Notice that the standard error attaching to the coefficients is 
.19. This indicates that the estimates of the population coefficient 


provided by the sample may be in substantial error. In such situa- 
tions, even if both Coefficients are si 


-sided test, we get .103 and 
12.3 1S not significant at the 5 per cent level, 


The Normal Sampling Distribution / 249 


equal, the less the correlation between predictor variables, the 
greater the reliability of the coefficients. 


For two reasons the use of the normal probability scale in making 
inferences from any of the statistics, except the z, transformations, 
in Table 10.3 must be considered as giving only approximate results. 
First, the standard errors of the statistics are stated in terms of sample 
statistics, rather than the unknown population parameters which are 
assumed in their derivation. Consequently the standard errors as 
stated are themselves inexact. Second, the sampling distributions of 
the statistics are only approximately normal in form. As sample size 
increases, the precision of the standard errors increases and the sam- 
pling distributions approximate normality more closely. When sample 
Size is, say, 60 or more, the use of the normal probability scale ordi- 
narily gives results which compare favorably with those obtained by 
more exact methods. When sample size is, say, 30 or less, the normal 
probability scale is not appropriate. 

The **60-or-more" and *30-or-less" statements are only rough 
guides. Approximations of varying degrees of exactness do not, of 
course, suddenly become acceptable or inacceptable. Moreover, the 
standard errors and sampling distributions of different statistics are 
affected differently by changing sample size. Then, too, quite inexact 
inferences may be satisfactory in quick and rough analysis. The only 
general statement that can be made is that the results obtained by use 
of the normal probability scale are generally inexact, although they 
become steadily better with increasing sample size. More exact methods 
9f making inferences from some of the statistics listed in Table 10.3, 
applicable to both large and small samples, will be discussed later. 


SAMPLE SIZE NEEDED FOR A GIVEN RELIABILITY 


As has been noted, the reliability of a sample statistic as an estimate 
Of the corresponding population parameter ordinarily is expressed in 
terms of a confidence interval which will include the parameter a speci- 
fied Percentage of the time. Since the width of the confidence interval 
Varies with the standard error, which in turn varies with the size of the 
Sample, under certain conditions it is possible to anticipate the size 
Of the sample needed to have specified reliability. We shall illustrate 
the procedure with reference to the mean. 

Suppose that it is desired to determine appr 
Of a sample such that, 95 per cent of the time, 
below and 3 units above the sample mean will include the po 


oximately the size N 
the interval 3 units 
pulation 


250 / Statistics in Education and Psychology 


mean, and suppose it is known or can be guessed with fair accuracy 
that the standard deviation o in the sampled population is about 8. 
Since the symmetrical 95 per cent confidence limits are at =+1.96511 
we have, by substitution in formula (10.3), 


1.96 = C — 


8/VN — 1 


so that the value of N, in round numbers, is 28. 

By similar methods we can anticipate the size of the sample needed 
to provide an estimate, of specified precision, of a population pro- 
portion. Suppose we wish to determine the 90 per cent confidence 
interval for the proportion of words in a children's encyclopedia that 
are not on a standard word list, and Suppose we wa 
be no wider than .06, extending .03 in both direction 
proportion. If our estimate or 
large a sample is needed? 

Substituting in formula 
term 1/2N, we have 


nt the interval to 
s from the sample 
guess of the proportion is .15, how 


(10.5) and disregarding the correction 


isn 3 00. 
VLISQ — 5yN 


Solving for N, we get 380. This means 
words from the encyclopedia will 
interval for $ extending no more th 
proportion, provided our estimate, 

When the statistics from a p 
when their values can be guessed 
sample needed to impart a given 
listed in Table 10.3 can be anticipat 
to those described above. 


that a random sample of 380 
provide a 90 per cent confidence 
an .03 on either side of the sample 
-15, of $ is not too small. 

reliminary Study are available, or 
with fair accuracy, the size of the 
reliability to any of the statistics 
ed by procedures essentially similar 


uncertainty in such calcula- 
y study, particularly if the 
, may be substantially different from those observed 


ide, it is well to take samples 


sample size are quite un- 
value in the initial stages 
In some situations, it may be found 
possible, and time and money thereby 


The Normal Sampling Distribution / 251 


saved; in others, it may be found that samples of size suflicient to 
impart a needed reliability to the statistics in question are impossible, 
and wasted work thereby avoided. 


Assumptions Underlying the Normal Sampling Distribution 


When a sample of about 30 or more reliable observations are drawn 
from a very large normal population, various sample statistics are 
distributed in normal or nearly normal form with the approximate 
standard errors listed in Table 10.3. In the preceding section, we took 
these conditions for granted. Before considering further uses of the 
normal sampling distribution, we need to examine the effects of 
population nonnormality, relatively small populations, nonrandom 
sampling, and errors of measurement on the distributions and standard 
errors of sample statistics. 


NORMALITY AND EFFECTS OF NONNORMALITY 


When a sample is less than about 60, there are no very satis- 
factory ways to determine whether the sample departs sufficiently from 
normality to discredit the assumption that its parent population is 
normal. In this case, the assumption can be examined only in the 
light of what is known or believed to be true about the nature of the 
sample measures. For larger samples there are several ways to test 
the assumption, the most useful of which are, perhaps, the X? test 
of “goodness of fit" and tests based upon the g measures of skewness 
and kurtosis. The g tests are more sensitive than the x? test and have 
the additional advantage of indicating whether nonnormality is due to 
skewness or to nonnormal peakedness or to both. The X? test will be 
discussed in a later section; at this point we shall consider the g tests. 

E. S. Pearson has determined the 90 and 98 per cent sampling 
limits of the g's for various size samples, and the limits shown in 
Table D, Appendix, are taken from his work. The table is read as 
follows: In samples of 100 from a normal population, 90 per cent of 
the g, values may be expected to fall between —.39 and +.39 and 
90 per cent of the go values may be expected to fall between 2.35 and 
3.77, and so on. 

To test the assumption of normality, we need only to compute 
the gı and gə statistics of the sample and refer to Table D. As an 
example, consider the sample data of Table 4.5 where N — 138, 
gı = .32, and gy = 2.64. Referring these values to Table D, we find 


252 / Statistics in Education and Psychology 


that both of the g's fall within the 90 per cent sampling limits for 
samples of this size; hence, the assumption of population normality 
is tenable. x 

It is the recommended practice to conclude that the assumption 
of normality is tenable when both sample g's fall within the 90 per 
cent limits; is in doubt when either falls beyond the 90 per cent limits: 
and is untenable when either falls beyond the 98 per cent limits. To 
use Table D in one-sided tests, the upper or lower tabled limit is taken 
as the criterion, and the probability is halved. Usually, however, the 
two-sided test is the more appropriate. 

Although normal sampling distribution theory rests on population 
normality, there is sufficient reason to conclude that population non- 
normality does not seriously affect the sampling distribution of the 
mean and various other statistics, provided the sample is not small. 
Wallis and Roberts remark (Ref. 51, pp. 357-358): 

This approach to normality occurs in the sampling distributions 
of many of the statistics which are of practical importance. There is, in 
fact, a general law that, almost regardless of the shape of the original 
population, the shape of sampling distributions derived from it by con- 
sidering the statistics commonly computed from samples will be ap- 
proximately normal. This law can be proved mathematically, and the 
conditions under which it holds stated more precisely. It is known as 
the central limit theorem. . .. The important fact stated by the central 
limit theorem, that the sampling distributions of common statistics tend 


to be approximately normal, almost regardless of the shape of the orig- 
inal population, results in enormous simplification. . . š 


There have been many experiments in sampling from nonnormal 
populations which confirm the central limit theorem. Unfortunately, 
the theorem does not tell us how large samples must be before sampling 
distributions become essentially normal and sampling experiments do 
not provide general evidence. The “60-or-more” and “30-or-less” 
statements, p. 249, appear to apply here, as well as in connection with 
the use of the standard errors of Table 10.3, but there is no way of 
being sure of how large samples must be to insure dependable results. 

As a rule, when a sample of scores indicates marked nonnormality 
in the population, the investigator should attempt to transform the 
Scores to more nearly normal form. One possible normalizing trans- 
formation, T-scaling, was described in Chapter VI. Mueller (Ref, 32) 
discusses various other transformations at some length. In addition to 
permitting dependable tests of significance and ac 


ermittin curate estimation, 
distributions that approximate normality are more readily described 


The Normal Sampling Distribution / 253 


and analyzed than definitely nonnormal. However, when appropriate 
transformations cannot be found, the z tests of hypotheses and methods 
of estimation of the preceding and following sections appear to be 
fairly trustworthy, provided the sample is about 30 or more in size 
and the population at least ten times the size of the sample. 

In later sections, we shall describe several simple tests which do 
not assume population normality. These are known as nonpara- 
metric, or distribution-free, tests and although they are not as powerful 
as z tests, they frequently are extremely useful. 


SAMPLING FROM FINITE POPULATIONS 


In our discussion of the normal sampling distribution, we as- 
sumed an infinite population or, at least, a population so large as 
compared with the sample that the probabilities of drawing particular 
Scores remain practically constant during the sampling process. When 
the population is finite, the standard errors of the various statistics 
are somewhat smaller than those given in Table 10.3. Corrections can 
be worked out, but are rarely worth making. When the population is 
only 10 times the size of the sample, the standard errors of means and 
proportions, for example, are about .95 of their original values; when 
the population is 50 times the size of the sample, the standard errors 
are about .99 of their original values. Wallis and Roberts (Ref. 51) 
discuss the nature and effects of corrections for finite populations. 


NONRANDOM SAMPLING 


Several times in preceding pages we have stressed the fact that 
random sampling from specified populations is fundamental in the 
logic of statistical inference. Unfortunately, a great many research 
studies in psychology and education are limited to data already col- 
lected or to data available from convenient intact groups representing 
no known population. Little can be said about the consequences of 
nonrandom sampling from unknown populations. It is a fair guess 
that nonrandom sampling generally reduces variation and results. in 
too small sampling errors; however, conscious or purposive selection 
probably would tend to increase variation and to result in too large 
errors. There is no known way to correct sampling distributions and 
standard errors deduced from random samples for the effects of non- 
randomness. 

It has been argued that one should insist on a stringent level of 


significance, say the .001 level, before concluding that differences 


254 / Statistics in Education and Psychology 


observed in nonrandom samples are not due to chance. As a working 
rule, such procedure would seem unwise. It is impossible to make a 
general statement about the effect of nonrandomness on tests of 
Significance, and stringent levels would increase the risk of the Type II 
error. 

Perhaps the best that can be done is to identify a real population 
and thoroughly examine the representativeness of the sample in hand. 
Or, putting it the other way around, one should at least decide what 
real population is represented by the sample. If somewhere in the study 
it is possible to select at random the intact group or set of data to be 
studied, this should be done, both to lessen the likelihood of bias and 
to make less violent the use of sampling distributions, generated by 
random sampling, in drawing inferences from nonrandom samples. 

Finally, the nonrandom sample should be described in detail with 
respect to the features which may have influenced the findings. As 
pointed out in Chapter I, in reasoning from a nonrandom sample to 
the population or to other groups, we proceed essentially by analogy; 
i.e., we reason from particular to particular. Detailed description of 
the sample is necessary before we can judge whether there are real 
similarities and no crucial differences between particulars involved. 


ERRORS OF MEASUREMENT 


In the preceding problems of inference, no reference was made to 
errors of measurement, it being implicitly assumed that the obtained 
measures in the sample were perfectly reliable. In applied statistics 
the assumption is, of course, never satisfied, and inferences are subject 
to the effects of errors of measurement. 

Suppose that we have the obtained measure or score for each of N 
individuals in a sample. As was seen in Chapter VIII, an error of 
measurement attaches to each of these scores, and consequently their 
mean may be in error as an estimate of the mean of the true scores of 
the individuals. In other words, if the individuals were measured a 
second time, or repeatedly, the scores and their mean would likely 
differ to some extent from those first obtained. The same may be said 
for other sample statistics. 

If the errors of measurement are normally distributed and equally 
variable throughout the range of observed Scores, the extent of error 
attaching to a single score is indicated by the standard error of meas- 
urement, se, and the extent of error attaching to the mean of the scores 
by s./\/N. If, for example, s, is 5 in a sample of 30 Scores, the error 


The Normal Sampling Distribution / 255 


attaching to the mean due to unreliability of measurement is 5/4/30 
or only about .9. In general, an average is a reasonably accurate 
estimate of the true average, provided N is not small and s, not rela- 
tively large. 

However, there is another source of inaccuracy in the z ratio, 
such as that of (10.3). In Chapter VIII, we saw that errors of meas- 
urement inflate the standard deviation. It follows that z ratios having 
s in their denominator are decreased by such errors, since s is in- 
creased. Also decreased are z ratios having z, in their numerator, 
since errors of measurement deflate or attenuate the correlation 
coefficient. 

Errors of measurement and sampling errors, in the usual sense, 
cannot be considered simultaneously, and there are no generally 
satisfactory ways of adjusting a z ratio for unreliability of the basic 
data. When a z value turns out to be significant, it likely would have 
been more significant had the original measures been more reliable; 
but when a z value turns out to be nonsignificant, it may be that it 
would have been significant given more reliable measures. There is 
no substitute for reliable data in research. 


Inferences from Two Independent Samples 


Two samples are said to be independent when the scores in one 
are not paired in any way with the scores in the other. In other words, 
the samples are unrelated. (Cf p. 263.) The most common problem 
in research, perhaps, is to determine whether two independent samples 
differ sufficiently in some characteristic to discredit the hypothesis 
that the samples were drawn from populations similar in the charac- 
teristic chosen for comparison. If the difference between the samples 
is too great to be reasonably attributed to sampling fluctuations, i.e., 
chance, the null hypothesis is rejected, and the conclusion follows that 
a difference exists between the populations from which the samples 
were drawn. 

For example, if the difference between a sample which has under- 
gone an experimental treatment and a control sample which has not 
is too great to be reasonably attributed to chance, it follows that the 
treatment is reponsible for a real effect which will be observed again 
if further samples are so treated. As another example, if the difference 
in problem-solving ability between samples from populations of boys 
and girls cannot reasonably be ascribed to chance, the conclusion 


256 / Statistics in Education and Psychology 


follows that the populations do in fact differ in problem-solving 
ability. Differences which cannot be reasonably ascribed to chance are 
said to be significant. 


INDEPENDENT SAMPLE MEANS 


When sample observations are quantitative, the difference be- 
tween the sample means, M, — Mo, is the statistic ordinarily em- 
ployed in the two-sample comparison. It can be shown that if the 
samples are independent and if the sampling distributions of M , and 
Mə are normal, the difference M; — M» has a normal sampling 
distribution with mean equal to u, — He, the difference between 
population means, and with standard deviation approximately equal to 


| si 5 
SM,—M, = N= 1 s M (10.9) 


where sı, s and Ny, Na are the respective standard deviations and 
sizes of samples 1 and 2. This standard deviation is called the standard 
error of the difference between independent sample means. 


The procedure in testing a hypothesis about a difference between 
two population means is similar to that in testing a hypothesis about 
a single population mean. The z ratio is 


- |M, — Mo — K 
ç SM —M, iud 


where K is the hypothesized difference between population means. 
Usually K is 0, but any difference may be hypothesized. As a rule, the 
null hypothesis, Ho: ui — ua = 0, is tested against the alternative 
hypothesis H4 :uı — us Z 0. Hence, as a rule, the two-sided test is 
appropriate. However, H may be tested against a one-sided alterna- 
tive, H4: mı — uo > Kor Ha ‘M1 — Be < K, in case there is concern 
about a difference between means in one direction but not the other. 


EXAMPLE. For the 99 public school graduates of Table A, Appendix, 
the mean M, and the standard deviation s, of the Semester averages 
are 75.23 and 7.62, respectively. For the 47 private school grad- 
uates, the mean M» and the standard deviation S2 are 72.85 and 
9.02. Is the difference between these means significant at the 5 per 
cent level? 


Since both samples are well above 30 in size 


, the sampling dis- 
tributions of M, 


and M; are approximately normal; hence, the 


The Normal Sampling Distribution / 257 


sampling distribution of the difference, M, — Mo, is approximately 
normal. We shall test Ho: 1; — 12 = 0 against the alternative 
Ha:pi — po =Æ 0; i.e., we shall use a two-sided test. By formula 
(10.9), sy, rS = \/(7-62)2/98 — (9.02)?/46 or 1.54. Substituting 
in (10.10), we have 


75.23 — 72.85 
SECUNDI = 155; 


According to Table C, Appendix, a z of 1.55 corresponds to a P of 
.061. Doubling for the two-sided test, we get P — .12. The dif- 
ference is not significant at the 5 per cent level, or even at the 10 per 
cent level. There is little reason to think that the means of semester 
averages are unequal in the populations of public school and private 
school graduates represented by the samples. 


In determining confidence intervals or limits for differences be- 
tween population means, the procedure is similar to that followed in 
finding the confidence limits for any parameter. The 90 per cent 
limits for the difference, ui — 4», are given by (M; — Mo) + 
l.64sari—atə; the 95 per cent limits by (Mı — M>) + 1.965415: 
and so on. In the above example, the 95 per cent limits for 4, — ua 
are (75.23 — 72.85) + 1.96(1.54) or —.64 and 5.40. Any hypothesis 
proposing a value for the difference, #1 — Ha, included in the interval 
—.64 to 5.40 cannot be rejected at the 5 per cent level on a two-sided 
test, while any hypothesis proposing a value not included can be. 
Since 0 is included in the interval, the hypothesis that pı — M2 equals 
0 cannot be rejected, which is what we found out in the example. 
See pp. 259-260 for more about the advantages of confidence intervals. 


INDEPENDENT SAMPLE PROPORTIONS 


Suppose that we have two twofold populations in which the 
proportions of individuals having a given characteristic are $, and pa, 
respectively, In successive independent samples of sizes N; and Na 
from these populations, the sampling distribution of the differences 
between sample proportions, pi — P> is approximately normal, 
provided the sampling distributions of p, and p» are themselves ap- 
proximately normal. (See p. 242.) The mean of the distribution of 
differences is $, — ¿2 and the standard deviation is 


did — 6) , exl — 4»), 
Us cos y N. = Na (10.11) 


258 / Statistics in Education and Psychology 


Usually in making inferences about twofold populations, the 
hypothesis to be tested is that $, — $2 = 0, i.e., that the populations 
are alike in respect to the proportion of individuals having the charac- 
teristic in question. Under this hypothesis, $, = $» = $, say. The 
best estimate of $ is the mean proportion, p, of the samples as ob- 
tained from 


ies Nipi + Nopa | 
Ni + Na 


Hence, in testing Hy: 41 — ¢2 = 0, the standard error of the dif- 
ference is taken as 


RAE 
Blasts [BC x P), KU p). (10.12) 


The test of Ho parallels that of the test of a hypothesis that two popu- 
lation means are equal, illustrated above. The z ratio is 


z= Pra Ps à (10.13) 
P1—P2 
Consider the following example: 


EXAMPLE. Sixteen out of 35 youth from broken homes and 18 of 
65 youth from unbroken homes were found to be delinquent. Is the 
difference between these proportions significant at the 5 per cent 
level? 

The sample proportions are 16/35 or .46 and 18/65 or .28. 
We need to test Ho : 6; — do = 0 against the two-sided alternative 
Ha:ı — p2 % 0. Since p = (16 + 18)/100 — .34, we have the 
z ratio 

ala .46 — .28 B 
.34(.66) + -34(.66) 
35 65 
The P corresponding to 1.81 is, according to Table C, .035. Dou- 
bling for the two-sided test, we get .07. The difference is not signif- 
icant at the 5 per cent level. However, the difference approaches 


significance at the 5 per cent level, and the hypothesis might well 
be tested again in further samples. 


, When the hypothesis to be tested is Hy : 91 — $» = K, where K 
is some value other than 0, the procedure illustrated in the example 


The Normal Sampling Distribution / 259 


should not be used. In this case, we replace $, and $» of formula 
(10.11) with the sample proportions p; and pə, and use this standard 
error in the denominator of (10.13). This is also true in determining con- 
fidence limits for the true difference. The confidence limits for the dif- 
ference, $1 — $», are pı — Po = ZV p(l — pi)/Ni + p(l — pa)/ Ns, 
where z is 1.64 for the 90 per cent limits; 1.96 for the 95 per cent limits; 
and so on. (See Ex. 9.) 

The limits are somewhat more accurate if the quantity (N; + 
N3)/2N4N» is subtracted from the lower limit and added to the upper. 
This correction is an adjustment for the discreteness of the sampling 
distributions of proportions. It may be applied in testing hypotheses 
by subtracting the quantity from the absolute difference between the 
sample proportions. However, the correction rarely affects a decision 
to accept or reject a hypothesis. 


OTHER INDEPENDENT SAMPLE STATISTICS 

When formulas (10.2) and (10.9) are compared, it will be seen 
that the latter may be written sys = V sir, + Sir, That is, the 
standard error of a difference between two means may be obtained 
from the standard errors of the individual means. The same is true 
for other sample statistics. 

Thus, to compute the standard error of a difference between any 
of the statistics listed in Table 10.3, we need only to square and sum 
the standard errors of the two statistics, then extract the square root 
of the sum. For example, the standard error of the difference between 
two standard deviations is Wsi/(QN, — 2) + s3/(2N2 — 2); the 
standard error of the difference between the z, equivalents of two 
Pays is /1/(N, — 3) + 1/(N2 — 3); and so on. 

The test of a hypothesis about the difference between two popula- 
tion statistics and the construction of a confidence interval for the 
true difference parallel the procedures followed in working with the 
difference between sample means. (See Ex. 7b and Ex. 17.) 


CONFIDENCE INTERVALS VS. TESTS OF SIGNIFICANCE 


Several times we have remarked that confidence intervals serve 
to separate hypotheses which are acceptable from those which are 
rejectable at a specified level of significance. The 90 per cent con- 
fidence interval, for example, separates hypotheses which cannot be 
rejected at the 10 per cent level from those which can be on a two- 
sided test. For a one-sided test, the upper (or lower) endpoint of the 


260 / Statistics in Education and Psychology 


interval indicates the value at or beyond which a hypothesis can be 
rejected at the 5 per cent level. Other confidence intervals are inter- 
preted similarly. Thus, confidence intervals correspond to regions of 
acceptance; distances below and above their endpoints correspond to 
regions of rejection. These facts stand out when we sketch a normal 
curve and indicate the two-sided rejection region. 

Confidence intervals are usually more informative than tests of 
significance. They tend to be more informative both when a null 
hypothesis is rejected and when it is accepted. In the former case, they 
call attention to the size of the difference. For example, if the 95 per 
cent confidence interval for a difference between population propor- 
tions is, say, .02 to .08, the population proportions are probably not 
very different, although the hypothesis that the difference is zero can 
be rejected at the 5 per cent level. On the other hand, confidence 
intervals reflect the risk of the Type II error when a hypothesis is 
accepted. If the 95 per cent interval for a difference between two 
proportions is, say, —.06 to -52, the population proportions may be 
quite different, although the observed difference is not significantly 
different from 0 at the 5 per cent level. In other words, the risk of the 
Type II error is great. 

The only advantage tests of significance have over confidence 
intervals, in interpreting observed differences, is that they lead to 
definite probability figures. Ordinarily this is 
When it is important, both the P value and th 
might well be reported. 


not very important. 
e confidence interval 


SAMPLE SIZES NEEDED FOR RELIABLE DIFFERENCES 


In the preceding section, we Saw that it was possible to 
the size of the sample needed to provide a confidence interval of 
specified width for a parameter, By essentially similar methods, we 
may anticipate the sizes of two samples needed to provide a con- 


r the difference between two param- 
eters. In the case of two means, the formula is 


anticipate 


_ (msi + s3) 
N, = wdb s (10.14) 


where N; is the size of one sample; z is the normal deviate correspond- 
ing to the confidence interval desired; m is the multiplier for N, to 
give N>, the size of the other sample; s 


1 and s> are estimates or guesses 


The Normal Sampling Distribution / 261 


of the standard deviations of the populations; and d is half the width 
of the desired interval. For example, suppose that we want to select 
two samples of sufficient sizes to provide a 90 per cent confidence 
interval for the difference between two population means no wider 
than 6 units, extending 3 units on either side of the sample difference. 
Suppose further that we want one sample to be 1.5 times the other in 
size and that we guess that sı will be 10 and sə will be 12. We take z 
as 1.64, m as 1.5, and d as 3. Substituting in (10.14), we get N, equal 
to about 60, so that Na equals about 1.5(60) or 90. If we take samples 
of 60 and 90, the 90 per cent confidence interval for the difference 
between population means will be no wider than 3 units below and 
above the sample difference, provided our estimates of s; and s» were 
not too small. 

Usually we would want to select two samples of the same size, 
and would therefore set m of formula (10.14) equal to 1. However, 
When the cost or convenience of the samples are different, it may be 
desirable to take samples of different sizes. 

We may similarly anticipate the sizes of the samples needed to 
provide a confidence interval of specified width for the difference be- 
tween two population proportions, 6; — 6». The formula turns out 
to be 


y, = Zinn — EI pai — pal, (10.15) 


Where p, and p> are our estimates or guesses of $, and $» and the 
other symbols have the same meaning as in (10.14). 

By similar procedures, we may anticipate sizes of samples needed 
to provide confidence intervals of specified width for the difference 
between two parameters corresponding to the other statistics of 
Table 10.3. 

The advantages and limitations to approximations to needed 
sample size were previously discussed, p. 250, and we shall not repeat 
them here. Before leaving the topic, however, we should note that it is 
Possible to anticipate, for any selected risk of rejecting a true hy- 
pothesis, the size of the sample or samples needed to limit the risk of 
accepting a false hypothesis to some specified probability in both 
one-sample and two-sample problems. That is, it is possible to limit 
the risks of both Type I and Type II errors by controlling sample size. 


(See Refs. 10 and 43.) 


262 / Statistics in Education and Psychology 


INDEPENDENT SAMPLE DISTRIBUTIONS 


In our discussion of normality, pp. 251-253, we mentioned tests 
of significance that do not assume population normality. There are 
several such nonparametric or distribution-free tests for differences 
between two population distributions, one of the most useful and 
powerful of which is the Wilcoxon T test, also known as the Mann- 
Whitney U test. 

The Wilcoxon test is intuitively simple. Suppose that we have 
two independent samples of N, and Ns scores which can be ranked 
from 1, smallest, to Ny, largest, Ny being the total number of scores. 
If the sampled populations have identical distributions, we would 
expect the sum of ranks of the scores of either sample to have the same 
ratio to the total sum of ranks as the size of the sample has to total 
sample size. If marked disproportionality exists, we would suspect 
that the population distributions are different. Thus, the null hy- 
pothesis about two distributions may be tested 
Observed sum of ranks with the sum expected if the 

Consider the following data, which are the scores on a test of 
problem-solving ability of 10 fifth-grade pupils ranked most creative 


in writing and 12 fifth-grade pupils ranked least creative, on the basis 
of written compositions: 


by comparing an 
hypothesis is true. 


MOST CREATIVE: — 42, 44, 46, 50, 54, 56, 57, 57, 51, 59 
LEAST CREATIVE: 32, 35, 38, 38, 42, 43, 44, 50, 50, 52, 53, 53 


The hypothesis to be tested is that there is no difference in problem- 
solving ability, as measured, between the populations of most creative 
and least creative fifth-grade pupils, represented by the samples, 
against the two-sided alternative hypothesis that there is a difference. 
If we use F(x) and G(x) to symbolize the population distributions, we 
may express the hypotheses, Ho : F(x) = G(x) and Ha: F(x) = G(x). 

The first step in testing the null hypothesis, Ho, is to combine the 
scores in order of size and rank them from 1 to 22 as shown below, 


underlining the scores and ranks in one of the samples. Notice that 
the average of the ranks tied for is assigned to each of the ties, 

The next step is to compute the sum of ranks in one of the samples. 
For the underlined sample, the sum of ranks is 53+ 84 + 10 + 12 + 
17 + 18 + 20 + 20 + 20 + 22 or 153. This is the observed sum 


The Normal Sampling Distribution / 263 


of ranks in one of the samples. We could compute the expected sum 
of ranks in that sample and the standard error of the diflerence be- 
tween the observed and expected sums, but it is not necessary. These 
statistics are embodied in the formula 


pr — N (Nz + Di L. (10.16) 
VN N(Nr + 1)/3 


where T is the sum of ranks in one sample, N is the size of that sample, 
No is the size of the other sample, and Nr is the total sample size. 
Notice that the absolute difference in (10.16) is decreased by l, a 
correction for continuity. 

For the illustrative data, we have T = 153, Ni = 10, Na = 12, 
and Nr = 22, so that 


2(153) — 100Q2+D-1__735__247 
/10(12)(22 + 1)/3 /920 
Turning to Table C, Appendix, we find that 2.47 corresponds to a P 
of .0068. Doubling for the two-sided test, we have P = .014. We 
reject Hy and conclude that the distributions of problem-solving 
Scores are not alike in the populations of most creative and least crea- 
tive pupils. Actually, the conclusion may be narrowed. The test is 
sensitive mainly to differences in averages. When Ho is rejected, we 
may conclude that the populations differ in average value. 

The Wilcoxon T test is an excellent test. It is nearly as powerful 
as the z test; i.e., it is nearly as capable of detecting real differences, 
and may be applied to small as well as large samples. Although 
designed for continuous data, it appears to work well when ties are 
present and to be applicable to any sort of two-sample data which 
can be combined and logically ranked. When ties are numerous, 
however, the test tends to lose power and may fail to detect real 
differences. y 

Formula (10.16) is not appropriate if either sample is less than 
about 8. For smaller samples, tables of significance values of 7 should 
be used. Such tables, as well as corrections for ties, may be found 
in Ref. 44, 


Inferences from Two Related Samples 


at the scores in one sample are paired 


It is frequently the case th orones le are pa 
e other. This situation arises In single 


or matched with the scores in thi 


264 / Statistics in Education and Psychology 


group experiments, in which the same individuals are measured before 
and after an experimental treatment. It arises in parallel group ex- 
periments, in which each individual in the experimental group is 
matched with an individual in the control group and measurements 
made on both groups. In general, the situation exists whenever the 
Scores in two samples or the scores observed under two conditions can 
be meaningfully correlated. 

When two samples are related, the standard error of the difference 
between their means or other statistics tends to be smaller than that 
for the difference between independent-sample statistics. 


RELATED SAMPLE MEANS 


The standard error of the difference between the means of two 
related samples is 


$M,—M; = V str, + sir, — 2r 128M SM oy (10.17) 


where sm, and syr, are the standard errors of the individual means as 
given by formula (10.2) and 712 is the product-moment coefficient of 
correlation between the paired scores. It will be seen that if r5 is 
positive, as is usually the case, the standard error of the difference 
between related sample means is smaller than that for independent 
sample means. Hence, matched-group experiments tend to be more 
efficient than independent-group experiments. 

Unless the standard deviations and correlation coefficient are 
available, it is easier to compute the standard error by use of the 


formula 
1 NED? — 2 
SM —M, = N No Eu) , (10.18) 


where N is the number of pairs of scores and D is the difference in any 
pair. The application of the formula to the data of Table 10.4 is 
illustrated at the foot of the table, where SM —mM, turns out to be .967. 
We would have got the same result if we had computed 5M,, Sita and 
rig and substituted in formula (10.17). I 
After we find the standard error, we may use it to test hypotheses 
about the difference between related means and to determine con- 
fidence intervals, paralleling the procedures used for independent 
means. Consider the data of Table 10.4. They are the scores, X,, on 
an inventory of superstitious beliefs of 28 high-school students who 
had three years of science and the Scores, X», of 28 students who had 


The Normal Sampling Distribution / 265 
TABLE 10.4 


Computation of Standard Error of Difference 
between Means of Two Related Samples 


NES x X2 D = X. — X° D? 
1 8 15 = Y 49 
2 18 19 es] 1 
3 20 16 4 16 
4 14 17 _ 3 9 
5 22 18 4 16 
6 18 18 0 0 
7 23 16 7 49 
8 5 17 —12 144 
9 4 2 2 4 
10 11 19 — 8 64 
T 7 6 1 1 
12 27 25 2 4 
13 16 9 7 49 
14 2 11 —9 81 
15 4 4 0 0 
16 6 8 = 2 4 
17 1 9 — 8 64 
18 T T 0 0 
19 9 d 2 4 

20 10 17 = 7 49 
21 9 15 —6 36 
22 10 16 —6 36 
23 13 10 3 9 
24 8 13 _ 5 25 
25 17 14 3 9 
26 12 8 4 16 
27 10 15 =5 25 
28 10 10 0 0 
SUM 321 361 —40 764 
MEAN 11.46 12.89 — 1.43 


zp = —40. 2D? = 764. By formula (10.18), 


—— 
1 28(764) — (740) 
= LOVITI Cei l 
$Mi-M? = 28 £ 28 — 1 .967. 


One year of science, the students being matched on socioeconomic 
status and IQ. The question of interest is whether there is a significant 
difference between the means of these matched groups. To answer 
the question, we may test the null hypothesis Ho: Hi — #2 = 0 
against the two-sided alternative Ha:pi — lla FÉ 0. The means are 
11.46 and 12.89, as shown in Table 10.4, and the standard error of the 


266 / Statistics in Education and Psychology 


difference between means is .967. The z ratio, similar to that of for- 
mula (10.10), is 
e 11.46 — 12.89 


067 = —1.48, 


and the corresponding P for a two-sided test is 2(.0694) or .14. We 
cannot reject the hypothesis. We conclude that students having three 
years of high school science and comparable students having one year 
of science do not differ significantly in superstitious beliefs, as re- 
flected by the inventory. 

Confidence intervals for the difference Hı — Me may be deter- 
mined just as they are in the case of independent samples. For ex- 
ample, the 95 per cent interval for the difference in the above example 
is —1.43 + 1.96(.967) or —3.33 to .47. 


RELATED SAMPLE PROPORTIONS. THE SIGN TEST 


There are several situations in which the samples from twofold 
populations are related. For instance, the proportion of individuals 
who respond in a certain way to a test or questionnaire item may be 
compared with the proportion of the same individuals who respond 
similarly to the item after some experimental treatment or who re- 
spond similarly to a second test or questionnaire item. As a second 
instance, the proportion of individuals in a sample having a certain 
characteristic may be compared with the proportion in a second sample, 
the individuals in the two samples being matched on some basis. 

Consider the data obtained by asking 23 graduate students at the 
beginning and end of a course in principles of guidance whether they 
believed psychotherapy should be attempted by school psychologists. 
The 23 responses to the question at the beginning and at the end of 
the course are paired below, the y's indicating “yes” and the n’s in- 
dicating “no.” (Of course other codes, such as 1 for “yes” and 0 for 
“no” might equally well be used.) 


BEGINNING: Ynnynyyynyyynyyyyyynyyy 


END: "PhyYynnynnnynynnnynnynny 


Our concern is whether there is a significant difference between the 
proportions of y's (or n’s) at the beginning and the end of the course. 
This leads to a test of the null hypothesis that the proportion of y's 
in the "beginning" population is equal to the proportion of y's in the 
"end" population. We may state the hypothesis, Hy : $i = do. 


The Normal Sampling Distribution / 267 


There are several ways of testing the hypothesis, all of which can 
be reduced to what is known as the sign test. To apply the sign test, we 
replace each (y — n) pair with a plus sign and each (n — y) pair with 
a minus sign, disregarding the (y — y) and (n — n) pairs, since they 
show no difference. For the 23 pairs above, we get 12 plus signs and 
3 minus signs, a sample of 15 signs in all. 

If the null hypothesis is true, we expect half of the signs to be plus 
and half minus. Hence, we may test the hypothesis by determining 
whether the proportion of plus (or minus) signs differs significantly 
from .50. For the illustrative data, the observed proportion of plus 
signs is 12/15 or .80. By substitution in formula (10.5), we get 


1 
0 — 30 us 
25) _ 2.07. 
V 500 — .50)/15 


The corresponding P, for a two-sided test, is 2(.0192) or about .04. 
We may reject the hypothesis Ho: $1 = $2 at the 4 per cent level. 
We conclude that the proportions of “yes” answers to the question of 
whether school psychologists should attempt psychotherapy are not 
the same in the “beginning” and “end” populations. e 

As used in the sign test, formula (10.5) may be simplified. 
let h be the number of plus (or minus) signs, the proportion is h/N. 
Substituting ñ/N for p and .5 for $ in formula (10.5) and simplifying, 
we obtain 


If we 


HM LE? (10.19) 
VN 


where / is the number of plus (or minus) signs and N is the total num- 
ber of signs. The student can verify that when 12 (or 3) is substituted 
for h in (10.19) and 15 for N, z turns out to be 2.07, as before. — 

Formula (10.19) is satisfactory in practical work when Ñ is as 
smallas 5. It is to be remembered that in the sign test, N is the number 
of signs of differences within pairs. When there are zero differences, 
N is smaller than the original number of pairs. 

The sign test is applicable to data other than related-sample 
proportions. Look at the data of Table 10.5. These data will be used 
a little later to illustrate another related-sample test; at the moment 
consider only the signs of the differences in the third column. There 
are 15 plus signs and 4 minus signs. To test the null hypothesis that 


268 / Statistics in Education and Psychology 


the “Unsigned” and “Signed” populations are alike, we may sub- 
stitute in (10.19) to get 


> — 205) — 19 — 1 
V/19 


This z is significant at about the 2 per cent level on a two-sided test, 
and at this level we may reject the hypothesis that the populations 
are alike. Actually, as used here, the sign test is essentially a test for a 
difference in average value, and we may conclude that the population 
averages are different. 

The sign test of the significance of differences between two related 
samples is easily applied. All that is needed are the signs of the dif- 
ferences within pairs of observations. When the differences are reliable 
and independent, the test is appropriate. It is as good a test as any 
available when the data are qualitative. When the data are quanti- 
tative, as in Table 10.5, the sign test is useful primarily in quick and 
rough analysis. It is not as powerful as either the test for related means, 
discussed earlier, or the signed-rank test, discussed below. 


— 2.29. 


OTHER RELATED SAMPLE STATISTICS 


When a test of significance for a difference between two related- 
sample statistics is not available, as is the case for most of the sta- 
tistics listed in Table 10.3, a rough test may be made, using the stand- 
ard error for the independent-sample difference. For example, to test 
for a difference between two related-sample medians, the standard 
error for independent-sample medians may be used. 

The standard error for a difference between independent-sample 
statistics tends to be larger than that for related samples, and its use 
therefore tends to obscure significance. If, however, significance is 
indicated by the rough test, presumably it is safe to conclude that a 


difference exists between the populations represented by the related 
samples. 


RELATED-SAMPLE DISTRIBUTIONS. THE SIGNED-RANK TEST 


Consider the data of Table 10.5. They were obtained in a study 
of the validity of an adjustment inventory (Ref. 28). The inventory 
was given to a group of 22 subjects, with the request that it be returned 
unsigned. By hidden identification marks, it was possible to match 
the unsigned inventories with inventories given a few days later, the 


The Normal Sampling Distribution / 269 


latter being given with the usual request for signatures. The numbers 
of “maladjustment” items checked in the two situations, the dif- 
ferences, and the ranks of the absolute nonzero differences are shown 
in the table. Notice that the ranks of the absolute differences are 
separated into two groups, according to whether the original difference 
was positive or negative. 

If the null hypothesis is true that the population distributions are 
alike, we would expect the sum of positive ranks to equal the sum of 
negative ranks. If the departure from equality is too great to be reason- 
ably attributed to sampling fluctuations, i.e., chance, we would reject 
the null hypothesis, Ho : F(x) = G(x), and conclude that the popula- 


tions are different. 
TABLE 10.5 


Application of Signed- Rank Test to Results from an Adjustment 
Inventory Given to 22 Subjects Under Two Conditions 


CONDITION * DIFFERENCE RANK OF POSITIVE NEGATIVE 
UNSIGNED, Y SIGNED, Y X Y ABSOLUTE DIFFERENCE RAS oe 
37 23 +14 14.5 14.5 
44 34 +10 10 10 
55 59 EY 3.5 3.5 
70 25 4-45 19 19 
26 16 +10 10 10 
39 12 +27 18 18 
26 16 +10 10 10 
30 25 + 5 5 5 
85 60 +25 17 17 
83 69 +14 14.5 14.5 
74 74 0 
36 0 
39 10 zT 12.5 12.5 
39 61 ET. l l 
36 29 de 7 8 8 
81 70 ATI 12.5 12.5 
58 52 +6 6.5 6.5 
19 34 -15 16 16 
11 7 +4 3.5 3.5 
45 42 + 3 2 2 
14 14 0 
19 13 +6 6.5 6.5 
SUM 190.0 157.0 33.0 


* “Signed” and “Unsigned” indicate that the subjects signed or did not sign 
their names. The words have nothing to do with “signed” as used in signed-rank 
test. 


270 / Statistics in Education and Psychology 


The difference between observed and expected sums of ranks and 
the standard error of the difference are incorporated in the formula 


z= PT- NW + 1/2) — 1, 
VNW + DON + 1/6 


where T' is the sum of positive or negative ranks (it is easier to work 
with the smaller sum) and N is the number of nonzero differences. 

For the data of Table 10.5, T — 33 and N — 19. Substituting 
in (10.20), we have 


z = 263) — 1909 + 2] — 1 _ 2.47,. 
19(19 + 1)G8 + 1)/6 


The corresponding probability in Table C. Appendix, is .0068. Dou- 
bling for the two-sided test, we get P — .014. We reject the null 
hypothesis and conclude that it is very likely that there is a real dif- 
ference between signed and unsigned responses to this inventory. 

The signed-rank test is a nonparametric test requiring no as- 
sumption of normality. It was devised by Wilcoxon (Ref. 52) and, 
like his T test for independent samples, p. 262, it is sensitive mainly 
to differences in population averages. When it rejects the null hy- 
pothesis that the distributions are alike, we may conclude that the 
populations differ in average value. It is nearly as powerful as the z 
test for related-sample means and may be used with small as well as 
large samples. When sample size is about 8 or more, formula (10.20) 
gives a close approximation to the exact significance level of an ob- 
served sum of ranks; in smaller samples, if accurate results are desired, 
a table of exact significance levels should be used. (See Ref. 44.) 


In summary, the steps in applying the signed-rank test are as 
follows: 


(10.20) 


1. Find the difference in each pair of scores, subtracting in the same 
direction and attaching signs to the differences. Eliminate zero dif- 
ferences. 

2. Rank the differences in order of absolute value from 1, smallest, to 
N, largest, N being the number of nonzero differences. Where ties 
occur, assign the average of the ranks tied for to each of the ties, 

3. Classify the ranks as positive or negative, according to the signs of 
the original differences. 

4. Compute the sum of either the positive ranks or the negative ranks. 
Disregard the sign of the sum and substitute in formula (10.20). 

5. Find the probability corresponding to z. Double the probability if 
a two-sided test is desired, as is usually the case. If N is less than 


The Normal Sampling Distribution / 271 


about 8, the results are only roughly approximate, but usually satis- 
factory in practical work. 


CONCLUDING REMARKS 


The normal sampling distribution is applicable to a wide variety 
of problems. The sampling distributions of most of the commonly 
used statistics approach normality as sample size increases, with 
standard errors which can be satisfactorily approximated from sample 
statistics. When sample size is about 30 or more, the use of normal 
curve procedures is justified in practical work and gives results which 
are ordinarily in good agreement with the results obtained from more 
exact procedures. 

The normal sampling distribution is the backbone of so-called 
large sampling theory. It is widely useful, the most generally useful 
sampling distribution available. When N is small, however, it does 
not permit trustworthy inferences. What is needed is a distribution 
that is independent of unknown population parameters and is ap- 
plicable to samples of any size. This brings us to the ż distribution of 


the next chapter. 


EXERCISES 


1. The 138 MAT scores of Table A, Appendix, have a mean of 552 and a 
standard deviation of 79. Test the hypothesis at the 5 per cent level that 
the population mean is 540. Interpret. 

2. In Exercise 1, above, find the 95 per cent co 
population mean. Why is the confidence interva 
the test of hypothesis? 

3. In an opinion poll, 61 voters in 
didate X. (a) How sure can we 
tion favor X? (b) Find and interpret the 
for the population proportion. 

4. The coefficient of correlatio 


nfidence interval for the 
] more informative than 


a random sample of 100 favored Can- 
be that a simple majority in the popula- 
95 per cent confidence interval 


n between 19 pairs of scores is .54. 
(a) Test the hypothesis at the 1 per cent level that the population coef- 
ficient is O. (b) Find and interpret the 99 per cent confidence interval for 
the population coefficient. (c) Why is the risk of the Type II error 
relatively large? 

5. Consider the freshmen of Ta 
present, and future freshmen of t 
these hypotheses about the popu 


ble A, Appendix, a random sample of past, 
he given college. How would you test 
lation? (No computations required.) 


a. That the semester average is 73. 


b. That the proportion of private school students is .35. 


272 / Statistics in Education and Psychology 


10. 


11. 


c. That the coefficient of correlation between Regents' averages and 
semester averages is .70. 
d. That the distribution of PCR is normal. 


- (From Tate and Straub, School Review (1964) 72 :74-88.) In independent 


samples of Catholic academy and public school ninth-grade girls, the 
following statistics in a test of critical thinking were observed: Academy, 
N — 76, M — 18.32, s — 4.58; Public school, N — 43, M — 20.45, 
s — 3.82. (a) At what level is the difference between means significant ? 


(b) Find and interpret the 99 per cent confidence interval for the true 
difference. 


. The question arose in a large coeducational college as to whether the 


men students made better marks than the women. The following data 
were collected in random samples of 80 men and 60 women: Mean 
grade index for men, 3.82 with standard deviation of .90; mean grade 
index for women, 3.25 with standard deviation .60. (a) Is it reasonable 
to believe that the men and women are equal in achievement? (b) That 
they are equally variable in achievement? (c) To what extent are the 
results of (a) and (b) applicable to other colleges ? 


. Referring to Exercise 37, Chapter VII, (a) is the difference between 


proportions of successful and unsuccessful teachers who had held elec- 
tive offices significant at the 1 per cent level? (b) What question does 


the test for significance answer not answered by the fourfold coefficient 
of correlation, rp? 


. In Exercise 8, above, find and interpret the 99 per cent confidence in- 


terval for the true difference between proportions. 

In an experiment designed to test the value of wrong responses in in- 
ductive reasoning, 30 subjects were randomly assigned to two groups. 
Group I was given opportunity to make mistakes in looking for cues in 
a problem situation; Group II was not. The errors made by the subjects 
in a later test of ability to discover cues are given below. Is the dif- 


ference between groups significant at the 5 per cent level? Interpret 
the results. 


GROUP 1: 6, 6, 7, 8, 8, 8, 9, 9, 9, 9, 10, 10, 11, 12, 13 
GROUP II: 6, 7, 8, 10, 10, 12, 12, 13, 13, 14, 14, 14, 15, 15, 16. 


Criticize each of the following: 


a. An investigator measured every member of a population. 

ported both mean and standard error of the mean. 

In a sample of 12 from a normal population, an investigator observed 

a mean of 50 and a standard deviation of 5. From these data, he 

determined the standard errors and the confidence limits for q and c. 

c. Thirty pairs of fathers and sons were given a questionnaire regarding 
views toward conformity behavior. The significance of the difference 


He re- 
b. 


12. 


13. 


1S. 


LL 


. In Table 8.2, p. 206, are items 


. In tests of understanding princip 


The Normal Sampling Distribution / 273 


between the means of fathers and sons was tested by formula (10.16). 

d. An investigator found the 95 per cent confidence interval for the 
true difference between two proportions to extend from —.10 to .50. 
He concluded that there was no reason at all to doubt the null hy- 
pothesis that the population proportions were equal. 

e. An investigator found that the difference between the means of 
large samples of boys and girls on a 100-word vocabulary test was 
highly significant, with P less than .001. He concluded that boys 
and girls differ greatly in vocabulary. 


Two groups of college students, 100 in each group, were matched on 
initial performance in a biology test. One group was taught by the 
lecture-demonstration method, the other by a lecture-laboratory method. 
At the end of the experimental period, the first group had a mean, on 
the final test, of 56 with standard deviation of 7; the other group had a 
mean of 54 with a standard deviation of.6. The coefficient of correlation 
between initial and final test was .50. What can be concluded ? 

In Table A, Appendix, there are 34 freshmen of A socioeconomic 
ratings and 45 of C ratings. Suppose you could match about 30 of the 
former with a like number of the latter on the basis of SAT scores or 
percentile class rank. How would you see whether the two groups 


differed significantly in semester average? . i 
8 and 9 significantly different in diffi- 


culty at the 5 per cent level; i.e., is the proportion passing item 8 
significantly different from the proportion passing item 9. Repeat for 
items 25 and 26. . 

At the beginning of a course in international relations, the students 
were asked whether they believed war to be inevitable. Of the 62 students 
enrolled, 37 replied “yes” and 25 replied "no." At the end of the course, 
20 of the 37 who had replied “yes” replied “no,” while 5 of the 25 who 
had replied “no” replied “yes.” There were no other changes of opinion. 
Is the difference between proportions of "yes" replies at the beginning 


and end of the course significant at the 3 per cent level? . 
les and applying principles, 12 medical 


hown below. Is the difference between 


school students made scores as $ 
ficant at the 5 per cent level? 


understanding and application signi 
49 21 44 34 25 27 29 30 17 36 24 23 


UNDERSTANDING: 
9 40 19 11 25 30 24 24 4| 18 19 


APPLICATION: 42 


In a sample of 147 freshmen from College X the coefficient of correlation 


between a scholastic aptitude test and freshman grade-point average 
is .32; in a sample of 67 freshmen from College Y, the coefficient 15 31. 
Is the difference between coefficients significant at the 5% level? Con- 
struct and interpret the 95% confidence interval for the true difference. 


CHAPTER XI 


The t Sampling Distribution 


The exact sampling distribution of the ratio of a difference be- 
tween sample and population means to its standard deviation was first 
investigated by the English statistician, W. S. Gosset, who wrote 
under the pen name “Student.” (Ref. 40.) By somewhat empirical 
methods, he obtained the sampling distribution of the ratio (X — p)/s’, 
where X and y are sample and population means, respectively, and s’ = 
A/2x2/(N — D. Since 5 = V/Zx?/N, it will be seen that s = 
s\V/N/(N — 1). As mentioned on p. 238, the standard deviation, s, of a 
sample tends to underestimate the standard deviation, c, of the popula- 
tion. The quantity s’ is thus better than s as an estimate of c. 

Later statisticians, notably Fisher, building on "Student's" foun- 
dation, determined the sampling distribution of a / ratio. This ratio, 
essentially similar to “Student's” ratio, is defined as 


X- EINN 
i= = . ma 
SIN y nd 


In an experimental study of the sampling distribution of the 1 


ratio, 120 samples of 5 each Were randomly selected from the popula- 


274 


The t Sampling Distribution / 275 


tion of scores, Table B, Appendix, in which u is 40. The difference 
between each sample mean X and the population mean u was divided 
by the value of s’ in the sample and multiplied by \/5. To illustrate, 
the first sample had a mean of 42.5 with s’ = 6.5, so that the 1 ratio 
for that sample was (42.5 — 40) W3/6.5 or about .9. The distribution 
of the 120 z ratios thus obtained are shown in Table 11.1 and in Figure 
11.1. The distribution is nearly symmetrical about zero (g, = —.03), 
but is markedly leptokurtic (g2 = 3.86). This experimental sampling 
distribution is in good agreement with the theoretical distribution, as 


indicated in the figure. 

As the experiment illustr 
joint variation of X — p and s' in successive samp, 
the distribution allows for the variability of both X and s’. Since the 
variability depends in part on the size of the sample, the distribution 
changes with sample size or, more exactly, with what is known as 
degrees of freedom. Before considering the t distribution further, we 
need to look into the meaning of degrees of freedom. 


ates, the / distribution is generated by the 
les. In other words, 


DEGREES OF FREEDOM 


,  Thecomplete explanatio 
in advanced statistical theor 
for practical purposes is not diffic 

The number of degrees of fre 
always refers to the number of indepen; 


n of the concept of degrees of freedom lies 
y, but an understanding of the concept 
ult to acquire. 

edom of an estimate of a parameter 
dent values which contribute 


a 
T 


T 


10 


Frequency 


0 MO 30-20-10 0 10 20 30 40 50 
t Ratio 


Fig. 11.1. Frequency polygon of the distribution of 120 t ratios (Table 11.1) 
and curve of the theoretical t distribution having 4 degrees of freedom. 


276 / Statistics in Education and Psychology 


TABLE 11.1 
Distribution of t Ratios 
in 120 Random Samples of 5 
Each from A Normal 
Population 


Ít RATIO 


S 


l 
o 
Lá 
+ 
° 
lo 
ares bab saywa, 
—OtmNtMuwoeouoaóoowvououowvu-o- 


to the estimate. Let us see why this number is 
appearing in the z ratio of formula (11.1). 
Suppose that we are sampling from a normal population of scores 
with standard deviation ø and that the Scores are in deviation form. 
(This means merely that the mean of the population has been sub- 
tracted from each score.) If we had all of these scores in the population, 
they would of course sum to zero. A sample of deviation Scores, 
however, would not ordinarily sum to Zero; in fact, if there were no 
restrictions, the sum of a sample of deviation scores might be very 
different from zero owing to sampling fluctuations. One restriction 
that we could place on a sample of N deviation Scores, before using 
them to estimate c, would be that they sum to Zero, i.e., Ex = 0. 
In other words, we could force the sample and populat 
mean value and thus make the sample more representative of the pop- 
ulation than it otherwise might be. But if this restriction were made, 
one of the N deviation scores would not be independent. The student 


N — 1 for the estimate s’ 


ion to agree in 


The t Sampling Distribution / 277 


can easily convince himself that this is so by arbitrarily fixing N — 1 
deviation scores and noting that the Nth is always determined by 
the sum of the others and the restriction Ex = 0. 

We may look at this in a somewhat different way, but with the 
same results. In estimating the standard deviation ø of a population 
from a sample of N scores, we cannot use the deviations X — u of the 
scores from the population mean, since this mean is unknown. We 
therefore take the deviations X — X of the scores from the sample 
mean and, in effect, force population and sample means to agree in 
estimating c. The restriction results in the loss of one degree of free- 
dom, for reasons noted in the paragraph above. 

Since N — 1 of the sample data are free to vary, N — 1 con- 
tribute independently to the value of s’. Consequently, the variation 
of s' from sample to sample depends upon the variation not of N but 
of N — 1 of the sample data. The variation of the z ratio of formula 
(11.1), for any X, in turn depends upon the variation of the denominator 
s'. Thus, the distribution of that ratio has N — 1 degrees of freedom. 

The number of degrees of freedom of an estimate of a parameter 
is not always one less than the number of original observations; the 
decrease depends upon the number of independent restrictions necessary 
in arriving at the estimate or, what is the same thing, the number of 
constants which must be determined from the observations. The un- 
derlying principle, however, is always the same. Essentially the prin- 
ciple is that a statistic, as an estimate of a parameter, has degrees of free- 
dom equal in number to the number of independent observations con- 
tributing to its value. This is good common sense as well as sound 
theory. The principle may be stated in any of the following ways: The 
number of degrees of freedom of a statistic, taken as an estimate of a 
parameter, is equal to— 


a. The number of observations less the number of independent — 
tions placed upon them in calculating the statistic. For examp ade 
calculating s’ we have N pepe scores) whi 
have the si estriction that they must su b 

b. The s observations minus the number of constants de- 
termined from them used in calculating the statistic. In oa si 
we use N observations and one constant, the mean, determined from 


the observations. un o 

c. The number of observations contributing to the value of the € 
that are free to vary. In determining s’, N — 1 observations are free 
to vary, only one being fixed. 


278 / Statistics in Education and Psychology 


AREAS UNDER t CURVES 

As noted above, the shape of the curve varies with the number of 
degrees of freedom, df or n. The t curves corresponding to n's of 1, 
2, and 5, along with a normal curve drawn to the same scale, are shown 
in Figure 11.2. In effect there are many t distributions, one correspond- 
ing to each n, but as n becomes larger the curve approaches normal 
form rapidly. When n is greater than about 30, the normal curve is not 
a bad approximation to the ¢ curve, and the approximation becomes 
steadily better as n increases. For n small, the curve is decidedly lep- 
tokurtic, with larger tails than the normal curve. 

Since the form of the / distribution changes with n, the area under 
the curve subtended by given intervals on the base line varies with 7. 
Hence, the probability figure corresponding to a given 1 ratio depends 
upon n. Table F, Appendix, is a typical table of 1 ratios corresponding 
to specified probability figures for various n’s. As noted at the foot of 
the table, the probability figures are based on one tail of the distribu- 
tion. By relating the curves in Figure 11.3 to Table F and working 
Exercises 1 and 2, the student can readily become proficient in the use 
of the table. 


Inferences from Single Samples 


Unlike the z test of the preceding chapter, the ¢ ratio or / test can 
be applied to only a few of the commonly used statistics. However, 
where it can be applied, it has the advantage of giving exact results in 
samples of any size from normal populations. On the other hand, 

0.40f 


0.35 F € 
e 
0.30 F e 


0.25 


T 


0.20 


0.15} 


Relative frequency 


0.107 


0.05 - 


0.0 


Values of t 
Fig. 11.2. t curves for n = 1, n = 2,andn = 5 and normal curve. 


The t Sampling Distribution / 279 


Region of 
acceptance 
ñ 


-430 0 4.30 
t Ratio 
(n=2) 


Region of 
acceptance 


Region of 
acceptance 


-2.78 0 2.78 2.04 0 204 
t Ratio t Ratio 
(n=4) (n=30) 


Fig. 11.3. The two-sided regions of rejection in the t sampling distribution 
forn = 2,n = 4 and n = 30. 


since 1 tables give relatively few probability figures or significance 
levels, exactness is often theoretical rather than realized. 

The 1 test is chiefly useful in one-sample problems in drawing in- 
ferences from small sample means and in testing the significance of 


correlation coefficients. 


INFERENCES FROM A SAMPLE MEAN 


The use of the z distribution in drawing inferences about a normal 
population mean, u, from a sample of any size, involves only the cal- 
culation of the / ratio and the referral of the ratio to Table F, under 
the appropriate number of degrees of freedom. It is to be remembered 
that the probability figures corresponding to the tabled values of 1 are 
based on one tail of the distribution. 


EXAMPLE. In a random sample of 10 from a population of normally 
distributed IQ's the following IQ's are observed: 105, 98, 120, 95, 
115, 100, 110, 125, 92, 130. Is the evidence provided by this sample 


consistent with the view that the mean in the population is 100? 
J =z = 


The mean of the sample is 109, and MX — 
1,558. Hence, s = 4/1,558/9 = 13.2. The hypothesis to be tested 


is Ho: u = 100 against the alternative hypothesis Ha : y # 100. By 
formula (11.1), we have the ratio 
. (109 — 10010 _ 316 
pe 132 i 


280 / Statistics in Education and Psychology 


with 9 degrees of freedom. Entering Table F at n = 9, we note that 
since 2.16 falls between 1.83 and 2.26, P must lie between .05 and 
.025. This may be indicated by the expression, .05 > P > .025, 
which is read “.05 is greater than P is greater than .025." Doubling 
for the two-sided test, we get .10 > P > .05. We can reject the null 
hypothesis at the 10 per cent, but not at the 5 per cent level. Whether 
we conclude that the evidence is consistent with the view that the 
population mean is 100 depends on whether our criterion is the 10 
per cent or the 5 per cent level. 


It is instructive to compare the probability figure from Table F 
with that we would have obtained had we treated 2.16 as a z ratio and 
referred it to the normal probability scale. The latter procedure yields 
a P of about .03 for the hypothesis y = 100. The reason for the 
discrepancy is apparent when the tails of the curve for n = 9 are com- 
pared with those of the normal curve. For small samples, normal 
sampling distribution methods give probabilities which are substantially 
too small and thus result in the rejection of hypotheses more often 
than is justified. 

Unlike the normal sampling distribution, the / distribution does 
not permit a general statement or formula for the confidence limits of 
a population mean. This is because the width of a specified interval 
varies with n; the smaller the value of n the wider the interval. 

To find, say, the 95 per cent confidence limits for u from a given 
sample of size N, we must find the value of ¢ in Table F, Appendix, 
corresponding to the probability figure .025 at n (= N — 1) and sub- 
stitute in formula (11.1). Applying this procedure in the above example 
we would have Ji 

_ (109 — p)V10 
+2.26 = JJ , 


from which, solving for u, we would obtain 109 + 9.4 or 99.6 and 
118.4 as the 95 per cent confidence limits. To obtain the 90 per cent 
confidence limits, we would set 1 = + 1.83 and proceed as above. 
Ordinarily the most convenient computational form of formula 
(11.1) is 
== (11.2) 


in which X represents the raw or gross scores. lt is left as an exercise 
for the student to derive (11.2) from (11.1). 


The t Sampling Distribution / 281 


SIGNIFICANCE OF CORRELATION COEFFICIENTS 

A sample correlation coefficient is said to be significant if it leads 
to rejection of the hypothesis that the population coefficient is zero. In 
sampling from a normal bivariate population in which the product- 
moment coefficient, pzy, is zero, the ratio 


y EZ (11.3) 
Vi — r$, 


is distributed as ¢ with M — 2 degrees of freedom. Only the hypothesis 
that p+, = O can be tested by formula (11.3). Other hypotheses regard- 
ing the value of pz, must be tested by the z, technique, previously 
described. Formula (11.3) may be readily adapted to test whether a 
partial correlation coefficient is significantly different from zero. The 
first-order coefficient r12.3 involves M — 3 degrees of freedom; the 
second-order coefficient 12.31 involves N — 4 degrees of freedom; 


and so on. 
When the number of pairs of observations N is about 8 or more, 


the significance of the rank difference correlation coefficient r; may 

be tested satisfactorily by referring the ratio 

- VN — 2 (114) 
V1 — 5 

to a table of t under N — 2 degrees of freedom. The significance of 

8 is best tested by methods described in 


t 


ra in samples of size less than 


Ref. 44, p. 66. Me f 
The / test of the significance of the point biserial coefficient of 


formula (7.17) is the same as that of ra given in (11.4) above. | 

The z technique makes it possible to infer soundly from quite small 
samples the presence or absence of correlation in the population, and 
this has perhaps tended to obscure the necessity of large samples in 
dependable correlation analysis. Correlation and regression coefficients 
in small samples tend to be poor estimates of the corresponding 
parameters, and consequently provide little reliable information re- 
garding the amount of variation in the dependent variable which is ex 
plained by the independent variable(s). A small sample regression 
equation tends to be a poor approximation of the population regression 
equation. In the small sample situation, predictions are subject not 
only to the usual error of estimate but to the relatively large sampling 
errors that infest the regression equation. It is to be remembered, 


282 / Statistics in Education and Psychology 


particularly in correlation work, that the significance and the reliability 
of sample statistics are two quite different matters. 


SIGNIFICANCE OF DIFFERENCES BETWEEN TWO CORRELATION 
COEFFICIENTS INVOLVING A COMMON VARIABLE 


A problem which frequently arises in test selection is that of de- 
termining whether in a given sample the coefficient of correlation ria 
between a criterion variable and a predictor variable is significantly 
different from the coefficient r;4 between the same criterion and a 
second predictor. 

Under the assumption that the criterion variable is homoscedastic 
and normally distributed for each set of values of the predictor vari- 
ables, Hotelling shows (Ref. 20) that, in all possible samples for which 


the predictors have the same set of values as those observed in the 
given sample, the statistic 


t= (rie — ng) y EU + ma) (11.5) 


Š 
2(1 + 2risrisros — rio — r), — ri 


23 


follows the ¢ distribution with N 


s — 3 degrees of freedom. Consider the 
following example. 


EXAMPLE. In the class of 138 freshmen, Table A, Appendix, the 
following correlations are present: semester averages with VAT 
scores, "12 = .46; semester averages with MAT scores, rig = 30; 
VAT scores with MAT scores, r23 = 


28. Is rio significantly greater 
than r,5 at the 5 per cent level? 


The hypothesis to be tested is Hy : Di» — pis = O against the 
alternative Ha: py. — Pis > 0. The one-sided alternative is ap- 
propriate because we expect the VAT to show better correlation than 
the MAT with semester averages, and we want special protection 


against accepting Ho if Ha is true. (See p. 229.) Substituting in 
(11.5), we have 


"m (35) + 28) 
! Van + 2(46)030)28) — (46) — C309 — (28)3] 


- 


(46 


= 1.78. 


Referring to Table F, we find that the P corres 


ponding to 1.78 at 
n = 120 (as close as we can come to 135) is less 


than .05. We reject 


The t Sampling Distribution / 283 


Ho and conclude that the correlation of VAT with semester average 
is significantly higher than that of MAT. 

It will be seen that the conditions of the above test are quite restric- 
tive. However, when sample size is large or moderately so, the condi- 
tions presumably can be relaxed sufficiently to permit use of the test in 
most practical situations. At present, it appears to be the most con- 
venient and useful test available for comparing predictor variables. 


Inferences from Two Samples 


INDEPENDENT SAMPLE MEANS 

Fisher (Ref. 14) has shown that the / distribution has broader 
application than was originally contemplated. He has demonstrated 
that any ratio whose numerator is a normal deviate and whose de- 
nominator is an independent estimate of the standard deviation of the 
numerator is distributed as 1 with degrees of freedom n equal to the 
number of degrees of freedom attaching to the estimate of the standard 
deviation. Among the more important of these applications is the so- 


called t test of the difference between sample means. i 
In random, independent samples from normal populations with 


the same mean and the same standard deviation g, the statistic, 


T 1% = Z| K, (11.6) 
s'/1/Ni + 1/Na 


where the X's and N's have their usual meaning, K is the hypothesized 
difference (usually zero) between population means, and s' is the best 
estimate of ø provided by the two samples, follows the t distribution 
with n (= N, + Na — 2) degrees of freedom. Thus, inferences about 
differences between two population means can be made from the t 
ratio. 

The estimate s’ is obtained by adding the separate sums of squares 


and dividing by Mı + N2 — 2: Lei 


| de 
$t Ni + Ns — 2 


It will be noted that s’ is based upon two sums of squares. Since one 
degree of freedom is lost in computing each sum, the number of inde- 
pendent observations contributing to s' is Ni + Ns — 2. Substituting 


284 / Statistics in Education and Psychology 


for s' in (11.6) we have 


t= [Xi — X4|-K A (11.7) 


pat Be i Lj 
N. + Ng — 2/ NN, * No 


The sum of squares =x? in a sample usually is most easily obtained 
from the raw scores by the formula 


ax? = X INZX* - 1207 


As an illustration of the z test, we have the following example: 


EXAMPLE. In a transfer of training experiment, two small groups, of 
5 and 6 students, were randomly selected. One group received in- 
tensive training in a certain method of solving algebra problems, the 
other group in a second method. At the end of the experimental 
period, both groups were given a test containing 20 original problems. 
The scores on the test are shown below. Is the difference between 
the means of the groups significant at the 5 per cent level? 

GROUP I (N, = 5) 


GROUP II (N, = 6) 
X, xi Xa 2 
18 324 13 169 
17 289 9 81 
15 225 9 81 
10 100 7 49 
-6 .36 6 36 
SUM 66 974 6 36 
SUM 50 452 


We need to test the hypothesis, H 


M1 — Me = O against the 
alternative, H 


414a — M2 7% 0. To test Ho, we compute 
X, = Š = 15, 3 8 
= 102.8, x42 = (452) — (50y 

AA IL TS 


= 8.3, 
2  5(974) — (66)? 
ax} = OD (66 MI x = 35.3. 


Substituting in (11.7) we have 


, 13.2 — 83 4.9 


= — 2.07. 
J 102.8 + 2 poa 23 
TEES) lets 


The t Sampling Distribution / 285 


Entering Table F at n = 5 + 6 — 2 — 9, we find that 2.07 falls 
between 1.83 and 2.26, so that .05 > P > .025. Doubling for the 
two-sided test, we have .10 > P > .05. Since P is greater than .05 
the difference is not significant at the 5 per cent level. At that Il, 
we cannot reject Ho. However, the difference approaches significance, 
and the experiment might well be repeated with a larger number of 
students. 
___ If desired, formula (11.7) may be used to determine confidence 
limits for the difference between population means. After the de- 
nominator has been obtained, it is multiplied by the z value correspond- 
ing to the selected limits and the product is subtracted from and added 
to the difference between sample means. The 95 per cent confidence 
limits for the difference between population means in the above ex- 
ample are (13.2 — 8.3) + 2.26(2.37) or about —.5 and 10.3. The 
advantages of confidence intervals over tests of hypotheses were pointed 
out on pages 259-260. 


The + test is applicable to samples of any size; unless the N's are 


small, however, the results obtained through its use do not differ 
-ratio procedures of the 


materially from the results obtained by the z 
preceding chapter. The / test assumes that the sampled populations 
are normal and equally variable. When the samples are small, there 


are no satisfactory ways to validate the assumptions; however, there is 
considerable empirical evidence that the assumptions can be relaxed 
in practical work. Moderate departure from normality and equality 
of variance in the populations appear to have little effect on the z test 
for differences between means. (See Refs. 3 and 5.) 


RELATED SAMPLE MEANS 

. It may be shown that the / ratio for testing the significance of 
differences between related-sample means is equivalent to the normal z 
procedures illustrated on pp. 264-266. If we bring the procedures to- 


Bether, we may write 
Xx, — Xsl — K (11.8) 


r = Li 
NED? — (DY 
NXN — 1) 


neans, K is the hypothesized difference 
n means, D is the difference in any 
airs of scores. The t of (11.8) 


where X , and X; are the sample n 
(usually zero) between populatio 
pair of scores, and N is the number of p: 
has N — 1 degrees of freedom. 


286 / Statistics in Education and Psychology 


We may use the data of Table 10.4 to illustrate the test. Let us test 
the hypothesis Ho: 41 — uo = 0 against the two-sided alternative 
Ha: 1 — ua # Oat the 5 per cent level. When we substitute the ap- 
propriate statistics from Table 10.4 in formula (11.8), we have 


11.46 — 12.89 —143 
—148. 
28(764) — (—40);  .967 
28208 — 1) 


According to Table F, a 1 of —1.48 with 27 degrees of freedom corre- 
sponds to a P between .20 and .10 for a two-sided test. The null hy- 
pothesis is clearly tenable. 

Confidence limits or intervals in the related 
termined by procedures similar to those used in the independent- 
sample case. In the above example, the 95 per cent confidence interval 
for the true difference between means is — 1.43 = 2.05(.967) or —3.41 
to .55. 

It will be recalled that the z procedures of the preceding chapter 
gave about the same results in this example. This is expected when 
samples are as large as the present. As sample size decreases, the 1 
and z results diverge more and more. The / results are exact. 


-sample case are de- 


CONCLUDING REMARKS 


of any size, from 2 upward, draw. 
tion. Its limitations are owing, 
uncertain nature of small sample 


lt cannot be over-emphasized that esti 
of little value in indicating the true valu 


ve us to base inferences, 
‘a. In such cases we can rarely, if ever, 


of a parameter within 


The t Sampling Distribution / 287 


Hot the value of the correlation in the parent population, but, more gen- 
erally, whether this value can have arisen from an uncorrelated popula- 
tion, i.e., whether it is significant of correlation in the parent.* 


In addition to their susceptibility to relatively large sampling errors 
small sample statistics tend to be susceptible to errors of a 
For example, if the obtained scores in a small sample are not reliable, 
their mean and variance may be poor estimates of the mean and 
variance of the true scores of the sample. A second measurement on 
the same sample might yield a quite different mean and variance. In 
such a situation, any inferences at all tend to be questionable. 

The fact that the £ distribution is logically applicable to small 
samples does not, as a rule, lessen the desirability of large samples. 


EXERCISES 


1. From Table F, what probabilities correspond to each of the following 
values of n and 1: (a)n=1,1= —31.82; (b n = l, t — +31.82; 
(c)n = 1,1 = +31.82; (d) n = 10,1 = —1.09; (e)n = 15, 1 = 4134; 
(D n = 0,1 = +1.96? 

2. What values of t correspond to the 5 per cent regions of rejection sketched 

in Figure 9.2 when n = 2? When n = 10? When n = 60? 

What does the last row of Table D tell us about the / curve? 

4. The scores of 10 students on an 80-item statistics test were 41, 50, 65, 61, 
56, 59, 74, 23, 47, and 54. If the.mean score of students in the past on 
this test is 50, is the present class of 10 an unusual one? 

5. The ratios of posttest to pretest scores in an experiment on learning for 
independent samples of 5 male and 6 female subjects are shown below. 
(a) Is it reasonable to believe that the population mean of males is 100, 
the mean expected under the hypothesis of no gain? (b) Of females? 
(c) Is the difference between male and female means significant at the 5 


per cent level? 


p 


MALE: 118, 116, 110, 110, 106 


RATIOS 
FEMALE: 112, 109, 107, 106, 106, 105 


nfidence limits for (a) the 


6. In Exercise 5, above, find the 95 per cent col 
he differ- 


male population mean, (b) the female population mean, and (c) t 
ence between male and female population means. DM 
7. What are the assumptions underlying the ¢ sampling distribution ? 


of Statistics. 


roduction to the Theory 
d by their 


* G. U. Yule and M. G. Kendall, An Int 
Ltd., London, and use 


Copyright 1950 by Charles Griffin & Co., 
Permission, p. 485. 


288 / Statistics in Education and Psychology 


8. 


9. 


10. 


11. 


In what two ways is the ¢ test for a difference between means superior to 
the z test of the preceding chapter? In what way is it more restricted ? 
Apply the z test for differences between means of two related samples to 
the data of Table 10.5. Would you expect the ¢ table to give about the 
same P value as the normal table in this case? Why? 

Are the correlation coefficients for the data of Table 7.5 and Table 7.6 
significant at the 10 per cent level? 

Using formula (11.3), determine the absolute value of rz, needed for 
significance at the 5 per cent level when N — 5; when N — 10; when 
N — 15. Interpret the results. 


- In a sample of 86 college freshmen, the coefficient of correlation, r12, 


between grade-point averages and scores on a timed reading test was .35, 
while the coefficient, r13, between averages and scores on an untimed 
reading test was .55. The timed and untimed reading test scores were 


themselves correlated with r23 = .65. Is the difference between ri? and 
713 Significant at the 5 per cent level? 


CHAPTER XII 


T he x? Sampling Distribution 


A common problem in research work is that of determining whether 
a Set of observed frequencies is consistent with the set of frequen- 
cies expected if some theory or hypothesis about a population is 
true. For example, we may hypothesize that participation in student 
activities is unrelated to school grades and compare the frequencies 
Observed with the frequencies expected if the hypothesis is in fact true, 
classifying our observations as in Table 12.2, p. 296. If the discrepancies 
are too great to be reasonably ascribed to sampling fluctuations, the 
hypothesis is discredited. As another example, we may wish to de- 
termine whether the frequencies in the classes of a sample distribution 
differ sufficiently from theoretical normal frequencies to discredit the 
assumption of normality in the sampled population. In general, this 
Sort of problem arises whenever we are interested in determining 
Whether sample frequencies in specified classes are compatible with the 
frequencies we would expect in these classes if some theory about the 
Population is true. me 
The statistic used in such problems is known as X^ (chi square), 
Which may be defined as 
ss, (12.1) 
= š: f 
289 


290 / Statistics in Education and Psychology 


where fo is the observed frequency in a class and f, is the frequency 
expected if a theory or hypothesis is true, the summation being over 
all classes in which comparisons are made. 

The sampling distribution of x? is of wide usefulness. We shall find 
that it is applicable to several problems involving ranked data as well 
às to problems concerning the compatibility of observed and expected 
frequencies in classes. 


THE DISTRIBUTION OF X^ 


In an experimental study of the x? sampling distribution, 120 
samples of 40 each were randomly selected from the population of 
Table B, Appendix, in which the frequencies of the attribute A, B, and 
not-A-or—B are 200, 100, and 100, respectively. Hence, in samples of 
40, the theoretical or expected frequencies are 20, 10, and 10. The 
frequencies of A, B, and not-A-or-B in each sample were recorded, 
and the value of x? as defined in (12.1) was computed. To illustrate, 


if the frequencies in a sample were 23, 6, and 11, we would have in 
tabular form 


ATTRIBUTE 
A B Not-A-or-B 

Observed frequency fo 23 6 11 
Theoretical frequency f, 20 10 10 

fo — fe 3 —4 1 

t 5 -— pi^ 9 16 1 

E 9/20 16/10 1/10 

" 


so that the value of x? in that sample would be 9/20 + 16/10 + 1/10 
or 2.15. 

The 120 values of x? thus obtained from the chance discrepancies 
between observed and expected frequencies are grouped in Table 12.1. 
The histogram of the distribution is shown in Figure 12.1. It will be 
noted that the histogram follows roughly the shape of the superimposed 
curve of the theoretical x? distribution. 

The frequency distribution and the 
tant feature of the x? values. The values 
they necessarily differ by 1/20; moreov 
can be equalled by them. The student can satisfy himself that this is so 
by taking samples of 40 from the population (or by arbitrarily manipu- 


lating frequencies) and computing the resulting x?s, We Shall return 
to this feature a little later. 


histogram obscure an impor- 
are discrete. In this situation 
er, not all multiples of 1/20 


The X? Sampling Distribution / 291 


Frequency 


alii n T = 
037 187 337 487 637 787 937 1087 
Value of x? 


Fig. 12.1. Histogram of the distribution of 120 X? values (Table 12.1) ana 
curve of the theoretical X? distribution with two degrees of freedom. 


TABLE 12.1 


Values of X? in 120 Samples 
of 40 Each from A Threefold 
Population 


A 
Ss 


6.75- 
6.00- 
5.25= 
4.50- 
3.75- 
3.00- 1 
2:25- 
1.50- 21 
0.75- 16 
0.00- 43 


CNAWAKHK ONCOL 


. Several characteristics of the x statistic and its sampling distribu- 
tion can be deduced from the definition (12.1) and the table and figure. 
Its value depends upon discrepancies between observed and expected 
frequencies, If there are no discrepancies, i.e., if the frequencies are in 


292 / Statistics in Education and Psychology 


perfect agreement, x? — 0. Since the discrepancies are squared, the 
value of x? cannot be negative. Since the squared discrepancies are 
summed, the greater the number of discrepancies the greater the range 
of the x? values. If in the above experiment there had been, say, six 
classes instead of three involved in the comparisons of observed and 
expected frequencies, the range would have been considerably broader. 

The sampling distribution depends entirely upon the number of 
discrepancies contributing to the value of x? which are independent, 
i.e., the number of degrees of freedom n. Thus, in effect, there are many 
distributions, one corresponding to each n. 

The curves of the theoretical distribution of X? having 5, 10, and 
20 degrees of freedom, respectively, are shown in Figure 12.2. It will 
be noted that the shape and position of the curves vary with n, the 
number of degrees of freedom—the greater the number, the wider and 
the more symmetrical the curve. The curve approaches the normal curve 
as n increases; in fact, the normal curve is a special case of the x? curve. 

Table G, Appendix, shows the values of x? which correspond to 
given probabilities for n’s from 1 to 30. The table is read and in- 
terpreted as follows. When n — 2, the .05 point or the value beyond 
which 5 per cent of the area under the curve lies is 5.99, Hence, for 
n — 2, if X? equals or exceeds 5.99, the hypothesis that a given set of 
discrepancies between observed and expected frequencies are due to 
sampling fluctuations or chance can be rejected at the 5 per 
On the other hand, when — 2, the .95 point or the value beyond which 
95 per cent of the area lies is .103. It follows that only 5 per cent of the 
X's computed from random samples are as small as .103. Hence, for 
n = 2, if x? is .103 or less, the agreement between observed and ex- 


cent level. 


0.15 


0.12 


Relative frequency 


2 6 10 14 18 22 26 30 34 
Value of x? 


Fig. 12.2. Curvesof theoretical X? distributions forn = 5,n = 10,andn = 20. 


The X? Sampling Distribution / 293 


pected frequencies is so good that it raises the question whether the 
sampling technique permitted chance to operate freely, i.e., whether 
the compatibility of the frequencies was fairly tested. 

The interpretation of other x? values for n — 2 and of values for 
the other n’s of Table G is similar to the above. In general, when the 
Probability of a x? value is about .10 or more, the disagreement be- 
tween the observed and expected frequencies could easily arise owing to 
Chance, and there is little reason to doubt the theory or hypothesis 
being tested. When P is less than .10, the hypothesis is in doubt; and 
When P is about .05 or less, the hypothesis ordinarily is considered 
untenable. Of course, it is open to the investigator to adopt a more 
(or less) exacting significance level. 

On the other hand, when P is about .90 or more, the agreement 
between observed and expected frequencies—between observation and 
hypothesis—is better than anticipated when the hypothesis is true. 
The hypothesis is not in doubt, but the experimental procedures and 
Statistical computations are. As P approaches 1, the procedures and 
Computations become more and more suspect. 


Applications to Frequency Data 


There are three types of problems that require comparisons of ob- 
Served and theoretical or expected frequencies and the use of the X^ 
Sampling distribution. The types differ chiefly because the theoretical 
frequencies involved in each are determined by somewhat different 
Methods. They are essentially similar in nature, and the assumptions 
Underlying the application of the X? test are the same for each. Before 

IScussing the problems, we need to examine these assumptions. 


ASSUMPTIONS UNDERLYING THE X? DISTRIBUTION 


There are several approximations involved in fitting continuous 
Curves, such as those of Figure 12.2, to the distribution of the neces- 
Sarily discrete values of x?, as computed from frequency data. These 
“Pproximations may be poor unless the distribution of observed fre- 
Quencies about expected frequencies in each class is sufficiently normal 
Lo permit approximation by the normal curve. For example, in the 
er bling experiment described above, it is assumed that the numbers 
of A's, B's. and not-4's-or-B's are distributed about 20, 10, and 10, 
+ Pectively, in approximately normal form. In general, this assump- 
tOn is considered to be satisfied if the fç in any class is not very small. 


294 / Statistics in Education and Psychology 


When an f, is very small, the f,’s in successive samples tend to range 
more widely above f, than below, since an f, cannot be less than 0. 
As we have seen, such a situation usually results in skewness. How 
small is a difficult question, but there is a great deal of experimental 
evidence and rather wide agreement among statisticians that in no class 
should f, be less than 5 and that it is safe to go that low only if the total 
fes over all the classes is about 40 or more. (But see p. 302.) 

It is also assumed that the probability of obtaining any of the events 
in question remains constant, or practically so, during the sampling 
process. This assumption is considered to be satisfied if (1) the popula- 
tion is large relative to the sample, (2) the events are independent, and 
(3) the sample is random. 

The sampling distribution x? as defined in (12.1) is based upon the 
number of independent discrepancies between observed and expected 
frequencies. This number is the degrees of freedom n. 
the total number of classes in which comparisons 
total number of discrepancies, less the number of re: 
in which the expected and observed frequencies a 
The calculation of n in a given problem is relatively simple and will be 
illustrated in connection with the three general types of problems in- 
volving frequency data to which the x2 test is applicable. 


It is equal to 
are made, i.e., the 
Spects or constants 
re forced to agree. 


THE X° TEST WHEN THEORETICAL FREQUENCIES CAN BE 
DETERMINED FROM THE SIZE OF THE SAMPLE 


The simplest type of problems to which the X? test is applicable 
comprises those in which the expected frequencies can be determined 
from the size of the sample and the hypothesis to be tested. As an 
example of problems of this type, consider the following. In a study 
of the social desirability of five occupations, A, B, C, D, and E, each of 
40 subjects was asked which occupation he considered most socially 
desirable. Eight subjects selected occupation A, 12 selected B, 10 
selected C, 3 selected D, and 7 selected E. Under the hypothesis that 
the occupations are equally desirable, the expected frequencies are 
40/5 or 8. The data may be arranged in tabular form, 


OCCUPATION 
A B a D E 
fo 8 12 10 3 1 
te 8 8 8 8 
fo — fe 0 4 2 =5 ES 
(fo — f 0 16 4 25 
(fo — fe)?/fe 0 2 5 3.12 42 


The X? Sampling Distribution / 295 


The value of X(f — f.)?/f. or x? is 5.74 with 4 degrees of freedom. 
Consulting Table G at n = 4, we find that since 5.74 is between 5.39 
and 7.78, P is between .25 and .10,i.e., .25 > P > .10. The hypothesis 
cannot be rejected. There is little reason to doubt that in the popula- 
tion represented by the sample of 40, the occupations are considered 
of equal social desirability. 

In problems of this type, the number of degrees of freedom, n, 
is equal to the number of classes, k, in which comparisons between fo 
and f, are made, minus one. One degree of freedom is lost because the 
expected frequencies are determined from the size of the sample. This 
means that Xf, and Xf, are forced to agree and that, consequently, 
the differences between f, and f, must sum to zero. Hence, one of the 
differences or discrepancies contributing to the value of x? is not free 
to vary. (See p. 277.) In the above example, only 4 of the 5 differences 
are free to vary, and as a result x? has only 4 degrees of freedom. 

A hypothesis about the proportion, ¢, in a twofold population 
may readily be tested by x?. Suppose that in a random sample of 100 
Voters 57 favor Candidate A and 43 favor Candidate B, and suppose 
that we wish to know whether it is reasonable to think that opinion in 
the population is evenly divided. Under the null hypothesis, Ho : $ = 
50, we expect 50 of the 100 voters to favor A and 50 to favor B. Hence, 
the differences between observed and expected frequencies are 7 and 
—7. Since there are only two comparisons, x? will have 2-1 or 1 
degree of freedom. Now in all situations where x? has only 1 degree 
Of freedom, a correction for continuity should be made. The correction, 
known as Yates’ correction, is made by reducing the absolute value of 
the differences by .5. It will be recalled that a similar correction is 
made in the proportions test of formula (10.5). Making the correc- 
tion we have, by formula (12.1), x? = (6.5)?/50 + (—6.5)/50 = 1.69 
With 1 degree of freedom. Turning to Table G, we find 25 > P > 10 
and conclude that it is entirely reasonable to think opinion in the pop- 


ulation is evenly divided. 

If we me d the proportions test of formula (10.5) to the above 
data, we get z = (57 — .50 — .005)/v/.50(.50)/100 = 1.30, which 
'S the square root of the value, 1.69, obtained in the X^ test. The re- 
lationship z = \/x2 always holds for z and x2 obtained from the 

degree of freedom. Thus, 


Same fre is one 
quency data where there 1s 0 e 
hypotheses about a twofold population proportion, $, may be ee 
Y either the z or x? test. The latter is somewhat more easily app " , 
Ut is not useful in determining confidence intervals for $. As empha- 


296 / Statistics in Education and Psychology 


sized in Chapter X, confidence intervals are generally more informative 
than tests of significance. Moreover, definite probability figures cannot 
be obtained directly from x? tables ordinarily available. 

Just how to state the null and alternative hypotheses in problems 
like the above is a good question. It would not be incorrect to write 
Ho: 2(fo — f)? = 0 and Ha: Z(f, — f)? = 0, but as a rule the 
hypotheses can be and should be stated more meaningfully. Generally 
it is a good idea to write out the hypotheses in X? tests. For example, in 
the illustrative occupations problem, we might write “Hy : Occupations 
A, B, C, D, and E are considered of equal social desirability in the 
population," and “H4 : Occupations A, B, C, D, and E are not con- 
sidered of equal social desirability in the population." 

It is to be remembered that in problems where the k expected 
frequencies can be determined from the size of the sample and the 
hypothesis to be tested, x? has k — 1 degrees of freedom. 


THE X* TEST FOR CONTINGENCY DATA 


When the observations on two qualitative variables or one qualita- 
tive and one quantitative variable are classified in a two-way table, 
they are known as contingency data and the table as a contingency 
table. Generally, in dealing with such data we want to know whether 
the variables are related. Let us see how x? may be used to test for the 
significance of sample relationships. 

The semester averages, in three categories, and extent of. participa- 
tion in student activities of 146 college freshmen are shown in con- 
tingency Table 12.2. For the moment, ignore the numbers in paren- 


TABLE 12.2 
Semester Average and Extent of Participation in 
Student Activities of 146 College Freshmen 
(Data from Table A, Appendix) 


SEMESTER AVERAGE 
Below 70.0 70.0-80.0 Above 80.0 


TOTAL 
10 29 3 
Much 2 
PARTICIPATION (11.5) (19.0) (11.5) 4 
IN STUDENT i 
ACTIVITIES Average di Es 5) i j 52 
; 14 I 14 24 
Little 
(14.2) (23.5) (14.2) 52 
TOTAL 40 66 40 146 


The X? Sampling Distribution / 297 


theses in the table. noting only that 10 freshmen with averages below 
70.0 participated much, 16 an average amount, 14 little, and so on. 

. If there is relationship between semester averages and participation 
in student activities, the frequencies in certain cells will tend to be rela- 
tively great. If the relationship is positive, relatively greater frequencies 
will appear in the lower left, middle, and upper right cells; if negative, 
in the upper left, middle, and lower right. On the other hand, if there 
is little or no relationship. the frequencies will tend to show only 
Proportional density in the respective cells: i.e., cell frequencies will be 
distributed in the same ratio as the marginal totals. 

The first step in applying the X? test of the significance of the re- 
lationship between semester average and participation is that of de- 
termining the expected frequencies. The null hypothesis is that the two 
are not related, and if the hypothesis is true the expected frequencies are: 


CELL EXPECTED FREQUENCY 
Upper left (42 x 40)/146 — 11.5 
Upper middle (42 X 66)/146 = 19.0 
Upper right (42 X 40)/146 = 11.5 
Middle left (52 x 40)/146 — 14.2 
Middle middle (52 x 66)/146 = 23.5 
Middle right (52 x 40)/146 — 14.2 
Lower left (52 x 40)/146 — 14.2 
Lower middle (52 x 66)/146 = 23.5 


Lower right (52 x 40)/146 = 142 


These frequencies, shown in parentheses in Table 12.2, are the fre- 
quencies we would expect in the various cells if there were no relation- 
Ship between semester averages and extent of participation in student 
Activities. The student can verify that the expected frequencies 1n any 
TOW are distributed in the ratio 40:66:40, and those in any column in 
the ratio 42:52:52, 
After the expected frequencies are obtained, formula (12.1) may 

€ applied as shown below. The value of x? is 22.68 with, for reasons 
LO be noted later, 4 degrees of freedom. Entering Table G at n = 4, we 


CELL (fo — fe) (fo — fo? Wa hia 
Upper left = 15 2.25 5.26 
Pper middle +10.0 100.00 on 
Pper right = B5 72.25 23 
Middle left + 18 3.24 at 
Middle middle = A 35 0 
Middle right == ja 1.44 00 
Lower left — 2 ae 3.84 
Ower middle = 9.5 ; 6.76 
Ower right + 98 96.04 "PEE 


298 / Statistics in Education and Psychology 


find that, since x? is greater than 14.9, P is less than .005. The hy- 
pothesis that there is no relationship between semester averages and 
participation in student activities is strongly discredited. There is a 
highly significant relationship, which, by examination of the data, we 
find to be inverse. 

The reason why x? has 4 degrees of freedom in this problem is to 
be found in the procedure by which the fes are determined. Under 
the hypothesis that there is only random association between the vari- 
ables, the /,’s in the various cells of the table are determined from the 
marginal totals, as shown above. The f,’s and f,’s thus are forced to 
agree in at least 5 of the 6 marginal totals, and hence in any row or in 
any column the discrepancies sum to 0. As a result only 4 of the 9 
discrepancies contributing to the value of x? are independent or free to 
vary. Thus, x? has only 4 degrees of freedom. 

It is important to note that although this procedure limits or re- 
stricts the information provided by the sample, it does not in any way 
introduce bias. Forcing the f.'s and Jos to agree in marginal totals 
does not influence the relation between the variables. In one or more 
cells, depending upon », the frequencies remain free to differ and to 
discredit the hypothesis. 

In general, in the contingency table consisting of h rows and k 
columns, the number of degrees of freedom n is (h — D(k — 1). 

All contingency table data which satisfy the assumptions under- 
lying the x? test may similarly be tested to see whether they indicate 
correlation in the sampled population. These include the data classified 
for fourfold point correlation analysis. Whether an ry calculated from 
the 2 X 2-fold table is significantly different from 0 can readily be de- 
termined by the x? test, The computation of the f;'s in all such tables 
follows that illustrated on p. 297. In general, the theoretical fre- 


squaring in formula (12.1). 
There is another formula for computing x? in the 2 X 2 table, 

A N(AD — BC| — N/2y rs 
G-FBXCED(4c-ogGrp 022 


The X? Sampling Distribution / 299 


where A, B, C, and D are the frequencies in the upper left, upper right, 
lower left, and lower right cells, respectively, and N is the total fre- 
quency. Yates' correction is made in formula (12.2) by subtracting N/2 
from the absolute difference between 4D and BC. Formula (12.2) is 
somewhat easier to use than (12.1); however, in small samples one 
should calculate the f; to be sure that none is less than about 5. 

The X? test may be used in testing the significance of the difference 
between two independent-sample proportions. The statistics from the 
two samples may be classified in a 2 X 2 table, essentially similar to the 
contingency table, and the x? test run as usual, using either formula 
(12.1) or (12.2). Although the test is somewhat easier to apply than 
the z test discussed earlier, p. 258, it cannot be used to test the hy- 
pothesis Hy : ¢; — $2 = K, when K is other than 0, and it cannot be 
Used to determine confidence intervals for the true difference. For 
these reasons, the z test is usually preferable. 

The x? test may be used to test the significance of differences be- 
tween two or more frequency distributions. Suppose that we have 
samples of urban, suburban, and rural families and suppose that we 
Wish to know whether these indicate that urban, suburban, and rural 
Populations differ in family size. If we classify the samples according 
to family size in the ten classes, 1, 2, 3, ..., 10 or more members, we 
Shall have a 3 x 10 contingency table. To test the null hypothesis 
that the population distributions are alike, we would compute ls 
expected frequencies from the marginal totals and apply formula 924) 
as in any contingency table. Our x? value would have 18 degrees o 


freedom, (Why?) 


THE CONTINGENCY COEFFICIENT 


. ati i 
Sometimes it is useful to obtain a measure of the . = 
tWeen the variables of the contingency table. Such a = 
able in the contingency coefficient, C, defined in the formu 


x (12.3) 


ess INE fter 
It will be seen that the contingency coefficient E genau 
has been computed. For the data of Table 122, 


be- 
ail- 


C= 2.68) (146 + 22.68) or +37. 


e or nega- 


B é ionship is positiv 
fore we can determine whether the relationship is P dci d 


i s š : i t 
tive, we must examine the classifications in the table and no 


300 / Statistics in Education and Psychology 


in which the discrepancies between observed and expected frequencies 
are most pronounced. In this case, the relationship is negative in a 
meaningful sense. Those who participate much are well below expecta- 
tion in the “above 80” class; those who participate little are well above 
expectation in that class. 

As a rule, no sign should be attached to C, since the coefficient 
indicates only the amount of the relationship between the variables. 
Any further interpretation must be made in the light of the nature and 
classification of the data. This is a disadvantage of C, although not 
ordinarily a serious one. When the direction of relationship has real 
meaning, it can be inferred by inspection of the table. 

The lower limit of C is zero; the upper limit depends on the 
number of cells in the contingency table, but it is always less than one. 
For example, ina 2 X 2 table C cannot exceed .71, whereas ina 4 X 4 
table its upper limit is .87. Hence, C's from dissimilar tables are not 
entirely comparable. 

Despite its crudeness, C has several a 


dvantages. It is easy to com- 
pute. It involves no assum 


ptions of linearity and normality. It can be 
used when the variables are continuous, discrete, or qualitative, or 
when one variable is of one kind and the other of another kind. 

C is frequently useful in stating the null 
regarding contingency data. If we use pe to i 
tingency coefficient, we may write Ho: po 


and alternative hypotheses 
ndicate the population con- 
= Oand Hy:pc x 0. 
GOODNESS OF FIT 


The x? test is frequently useful in 
potheses regarding the form of a 
When used in this connection, it is 
“goodness of fiL" The name is not particularly definitive, since all 


X? tests of frequency data may be regarded as goodness of agreement 
or fit of observed and theoretical frequencies. 


Essentially the test consists of de 
quency distribution is sufficiently well 
say the normal, to have arisen in sa 
tributed in that form. In illustration of 
the agreement between the observed and theoretical frequencies of 
Table 6.2 is sufficiently good to Support the assumption that the 
sampled population is of normal form. The f, and f, from that table 
are now shown in Table 12.3. As noted in Chapter VI, the f, are the 
theoretical normal frequencies in a distribution with N = 138, X = 


testing assumptions or hy- 
population frequency distribution. 
commonly referred to as a test of 


termining whether a sample fre- 
fitted by some theoretical form, 
mpling from a population dis- 
the test, let us determine whether 


The X? Sampling Distribution / 301 


552.11 and s — 79.32. They are the frequencies expected under the 
assumption of normality. 

Before computing xX”, the expected frequencies in the upper three 
and in the lower two classes are pooled. This should always be done 
when an f. is less than about 5. After pooling these frequencies, 11 
discrepancies remain, but only 8 of them are independent. In de- 
termining the expected frequencies, as shown in Chapter VI, the ex- 
pected and observed frequencies are forced to agree in three constants, 
namely, N, X, and s. The procedure results in the three restrictions, 


She = Bho DheX = Ifo, BEX? = Dh 
TABLE 12.3 


x? Test of Goodness of Fit of Normal Distribution 
(Data from Table 6.2) 


fo fe fo — fe (fo — F lfe 
1 .88 
js ism 23 01 
3 3.37 
8 6.44 1.56 38 
12 10.35 1.65 26 
13 15.29 —2.29 34 
17 19.13 2.13 24 
18 20.80 —2.80 38 
18 19.57 1:57 43 
17 15.98 1.02 .07 
16 11.08 4.92 2.18 
10 7.04 2.96 1.24 
2 3.77 
I h SA 6.56 —3.56 1.93 
sum 138 138.01 — 01 7.16 


The value of x? is 7.16, withn = 11 — 3 = 8 Referring to Table G 
We find that this value corresponds to a P of about .50. In sampling 
Tom à normal population, under the given conditions, disagreement 
between So and f, as great as the observed would be expected about 


30 per cent of the time, owing merely to sampling fluctuations. There 


is thus no reason to doubt the assumption of normality in the sampled 
tion is not proved ; 


Population. It is important to note that the assump 
it is only shown to be tenable by the X^ test. f 

. The chief limitation of x? in testing goodness of fit is due to its 
failure to regard signs of discrepancies. Inspection of the discrepancies 
Of Table 12.3 indicates that the Ja are less than the fe in the middle 


302 / Statistics in Education and Psychology 


classes of the distribution and for the most part greater on either side. 
This suggests that the population form may be somewhat flatter than 
the normal, although the x? test fails to reveal it. Unless the signs of 
the discrepancies tend to be unpatterned, the x? test of goodness of fit 
is not appropriate. When the discrepancies in several consecutive classes 
of the distribution are alike in sign, the assumption of normality is better 
tested by use of the g statistics. 


PRECAUTIONS IN USING X* 


The x? test for frequency data is widely useful. It is essentially 
simple and easy to apply. When the underlying assumptions are 
satisfied, it is dependable. But the test is not foolproof, and in its use 
and interpretation several precautions should be kept in mind. 

When x? has only one degree of freedom, no expected frequency 
should be less than 5, and Yates’ correction Should be applied. When 
X? has two or more degrees of freedom, the *5 or more" rule may be 
relaxed. Walker and Lev (Ref. 50) suggest that (a) if roughly ap- 
proximate probabilities are acceptable, an expectation of only 2 in a 
cell is sufficient, and (b) if the expectation in all of the cells but one is 5 
or more, that an expectation of only 1 in the remaining cell is sufficient 
to provide a fair approximation to the exact probabilities. Cochran 
(Ref. 7) suggests that 80 per cent of the expected frequencies be 5 or 
more and none less than 1. However, when any expected frequency is 


less than about 5, the probabilities can be considered to be only ap- 
proximate. 


The two commonest mistakes, perhaps, in applying the x? test to 
frequency data are due to ignoring the frequency of nonoccurence and 
lack of independence. As an illustration of the former, consider the 
following. Ina study of college freshmen achieving less than, as well as, 
and better than expected on the basis of aptitude, an investigator identi- 
fied 20 freshmen from each category and gave the 60 a test of listening 
skills. He found that 6 students in the first category, 13 in the second, 
and 14 in the third could be classified as good listeners. To test the 
hypothesis that achievement and listening skills are unrelated, he 
classified the data as shown below, reasoning that under the null hy- 


"Less than" "As well as" “Better than” 
fa 6 13 14 
fe 11 11 


11 
pothesis the good listeners would be divi 


S ded equally among the three 
Categories. He obtained a x2 of 3.45 wi 


th 2 degrees of freedom and 


The X? Sampling Distribution / 303 


concluded that listening skills were not related to achievement, since 
325 > P > .10. The mistake the investigator made was that of ignor- 
ing the frequencies of not-good listening. The correct classification is 


“Less than” “As well as” “Better than” 
GOOD LISTENER: 6 13 14 
NOT GOOD LISTENER: 14 7 6 


When the expected frequencies are computed from the marginal totals 
and formula (12.1) applied, x? turns out to be 7.67, significant at the 
23 per cent level. 

The mistake occasioned by lack of independence of data may be 
illustrated as follows. In a study of racial differences, each member of 
a large sample from three races was given a list of 20 occupations and 
asked to check the 4 he considered most socially desirable. The data 
were classified by occupation and race in a contingency table, and the 
X? test applied. Such data are not independent, however, and the x? 
test is not applicable. Independence exists only when each tally in the 
contingency table represents a different individual or event, so that the 
total frequency in the table equals the number in the combined samples. 

To avoid possible confusion, it should be noted that independence 
of data has nothing to do with independence as used when X? is referred 
to as a test of independence or no relationship. The latter refers to the 
two variables of the contingency table, not to the individual tallies. 


Applications to Ranked Data 


ance are included under 
ch the 1 tests for differ- 
, are special cases. 


A good many important tests of signific 
methods known as analysis of variance, of whi 
ences between means, discussed in Chapter XI 
Analysis of variance includes tests of significance for differences be- 
tween three or more independent and related sample means, as well as 
more complex tests, all of which make use of a sampling distribution 
known as the F distribution. The F distribution is beyond the scope 
Of this book; however, we can discuss two x? tests for ranked data 
Which are nonparametric analogues of the analysis-of-variance tests 
for differences between three or more independent and related sample 
means. The x? tests are nonparametric because they involve no assump- 
tions about population normality or homogeneity of variance. They are 
Widely useful, nearly as powerful as the parametric tests, and easy to 


apply. 


304 / Statistics in Education and Psychology 


SUM-OF-RANKS OR H TEST 


The Kruskal-Wallis (Ref. 27) sum-of-ranks, or H, test is an exten- 
sion of the Wilcoxon independent-sample test, p. 262. Suppose that 
we have k independent samples of scores which can be combined in an 
ordered series and ranked from 1, smallest, to N, largest, N being the 
total number of scores. If the sampled populations were alike, the total 
sum of ranks would be expected to be divided proportionately among 
the k samples, in accordance with sample size. If the sums were dis- 
proportional beyond sampling tolerance, there would be reason to 
suspect that the populations were different. Thus, the null hypothesis 
that the populations are identical may be tested by comparing the ob- 
served sums of ranks with their values expected if the hypothesis is true. 
The comparison is made by means of the formula 


2 12 I5 
x ma "IE M 3(N + 1), (12.4) 


where N is the total number in the combined samples, and T and m 
are the sum of ranks and number in any sample. As computed here, 
X? has k — 1 degrees of freedom, k being the number of samples. If 
no sample is less than 3 and most are greater than 3, the test is 
dependable. 

Let us apply the H test to the 5 samples of Table 12.4 consisting 
of the IQ's of 24 junior high school pupils classified according to occu- 
pation of father. The hypothesis to be tested is that the five populations 


TABLE 12.4 


IQ's of 24 Junior High School Pupils Classified 
According to Occupations of Fathers 


PROFESSIONAL BUSINESS SKILLED SEMISKILLED UNSKILLED 
IQ RANK IQ RANK IQ RANK IQ RANK IQ RANK 
120 24 118 22 118 22 112 17 108 9 
118 22 116 19$ 112 17 109 11 102 2 
116 193 110 13$ 110 135 107 7 101 1 
112 17 110 13$ 110 13% 106 3 

106 5 108 9 106 5 
108 9 105 3 

NUMBER, 

m 4 5 6 6 3 


SUM OF RANKS. 
T 82 734 84 48 12 


The X? Sampling Distribution / 305 


represented by the samples are alike in IQ. The ranks that the 24 IQ's 
would occupy in the combined series are shown in the table, the mean 
rank of the tied IQ's having been assigned to the ties. The respective 
sum of ranks and sample sizes are shown at the foot of the table. When 
we substitute these sums and numbers in (12.4) we have 


2 12 (822 , MY, (84 , (48)? | (12)? 
^ m a sn r SEE WS SS: | 

= 3Q4 + 1), 
and x? turns out to be equal to 12.80, with 5 — 1 or 4 degrees of free- 
dom. According to Table G, Appendix, this value is clearly significant 
with .025 > P > .010. We may conclude that the populations repre- 
sented by the 5 samples are not alike in IQ. More precisely, since the H 
test is sensitive mainly to differences in averages, we may conclude that 
the populations differ in average IQ. 

Like the Wilcoxon test, the # test is an excellent test. It is nearly 
às powerful as the parametric analysis-of-variance test and requires no 
assumptions of population normality or homogeneity of variance. Al- 
though designed for continuous data, it appears to work well where ties 
are present and to be applicable to any sort of data which can be com- 
bined and ranked. The test is easy to apply. All that is necessary is to 
combine the samples, rank the combined scores, find the sum of ranks 
in each sample, and substitute in formula (12.4). If ties are numerous, 
the test may fail to detect significant differences. (See Ref. 44 for a cor- 
rection for ties which always increases the value of x?.) 


RANK TEST FOR RELATED SAMPLES 
The rank test for three or more related samples was devised by 
Friedman (Ref. 17). Consider the scores of Table 12.5 obtained in a 


TABLE 12.5 
Scores in Seconds of 7 Subjects in Four Rigidity Tests 


TEST 
BLOCK DESIGN NUMBER SERIES 


SUBJECT ANAGRAMS WATER JAR 
1 14 2 12 (1) 15 (3) 27 (4) 
2 16 e 13 (1) 31 (4) 18 4 
3 16 (2) 9 (1) 18 (3) 36 zi 
4 13 (1) 18 (2) 20 (32) 20 1 ) 
5 8 (1) 18 (3) 15 Q) e 9. 
6 12 Q3 10 (1) 14 (4) s Ux 
7 12 (1) 40 (4) 22 (3) C 


306 / Statistics in Education and Psychology 


study of the generality of behavioral rigidity. The scores are the differ- 
ences (to the nearest second) between average time spent on “set” items 
and time spent on the extinction item in four tests. At the moment pay 
no attention to the numbers in parentheses. 

When we have quantitative data classified in C columns and R 
rows, we may think of them as C related samples of R scores each or 
as R related samples of C scores each. For example, we may think of 
the scores in Table 12.5 as 4 related samples of 7 each and ask whether 
there are significant differences between columns (tests), apart from 
differences between subjects. Or we may think of the scores as 7 related 
samples of 4 each and ask whether there are significant differences be- 
tween rows (subjects), apart from differences between tests. 

To see whether there are significant differences between columns, 
we test the hypothesis that the 4 samples are from a common popula- 
tion. To test this hypothesis, we rank the scores in each row from 1 
to 4, as shown in the parentheses, and find the sum of ranks in each 
column. If the hypothesis is true, we would expect the 4 sums of ranks 
to differ from each other by amounts no larger than can be attributed 
to chance. The ranks in the first column sum to 111%, those in the sec- 
ond to 13, those in the third to 2214, and those in the fourth to 23. To 
determine whether the differences between these sums are significant. 
we substitute in the formula 


2 12 — 
ee | 157% = 5 
x m al T°] — 3R(C + 1), (12.5) 
where R is the number of rows, C the number of columns, and T the 
sum of ranks in any column. Here, X? will have C — 1 degrees of free- 
dom. Since R is 7, C is 4, and the sums of ranks are 1114, 13, 2215. 
and 23, we have by substitution 


x = Lan) IID? + (132 + (224)? + 039-304 + D 


so that x? = 9.56 with 4 — 1 or 3 degrees of freedom. Turning to 
Table G at n = 3, we find .025 > P > .010, and conclude that the 
differences between columns (tests) are significant at the 214 per cent 
level. 

To test for significance of differences between rows (subjects) in 
Table 12.5, we would rank the scores in each column from 1 to 7. 
compute the sum of ranks in each row, and substitute in (12.5), after 
reversing the roles of R and C. It is left as an exercise for the student 
to show that the resulting x? is about 7.4 with 6 degrees of freedom and 


The X? Sampling Distribution / 307 


the corresponding P greater than .25. Here, however, we are treating 
the data as 7 related samples of 4 scores each, and the x? approximation 
of (12.5) underestimates significance. Where there are fewer than 7 
Scores per sample or fewer than 4 samples, tables of exact significance 
levels should be used. (See Ref. 44.) 

The rank test for 3 or more related samples is nearly as powerful 
as the parametric analysis-of-variance test in two-way classification. 
It may be applied to any data which can be classified in rows and 
columns and ranked. It is essentially a test for significant differences 
between averages. When it discredits the null hypothesis, we may con- 
clude that the populations differ in average value. 


EXERCISES 


l. In a large high school, first referrals by teachers of "problem students" 
to school counselors over a six-week period, considered to be typical, 
were distributed by days as follows: Monday, 25; Tuesday, 10; Wednes- 
day, 18; Thursday, 24; Friday, 36—a total of 113 referrals. Test the 
hypothesis that there is no relationship between referrals and days of 
the week, using the 1 per cent level. 

2. (Data from Allport's Youths Outlook on the Future.) The numbers in 
samples of university students agreeing and disagreeing with the proposi- 
tion that the world is a hazardous place in which men are basically evil 
and dangerous are shown below, by country. Test the hypothesis, at 
the 5 per cent level, that there is no relationship between country and 


response. Interpret the results. 


AGREE DISAGREE 
NEW ZEALAND 23 101 
MEXICO 78 137 
39 24 


EGYPT 


3. Show that the X? test can be applied to the proportions in Exercises 3 
and 8, Chapter 10. Why is the z test, as a rule, to be preferred? 

4. Fit a normal curve to the distribution of Exercise 6, p. 124, and test for 
g00dness-of-fit. Interpret the results. 

5. In testing the goodness-of-fit of a normal curve to a distribution of 600 
scores, a X? of 18.50 with 12 degrees of freedom resulted. However, the 
values of gı and g» of the distribution were .26 and 2.45, respectively. 
What can account for the inconsistency? Is the assumption of normality 
In the population sound ? 

6. An experimenter reported that a goodness-of-fit test of the normal curve 
to a sample of data resulted in a X? of 2.48 with 11 degrees of freedom. 
In what way does this report raise doubt? 

7. Find or plan research problems to illustrate each of the three general 


applications of the X? test to frequency data. 


308 / Statistics in Education and Psychology 


8. By algebraic proof or by application to the data in a given 2 X 2 con- 
tingency table, show that rz = C*/(1 — C’). 

9. By algebraic proof or by application to the data in a given 2 X 2 con- 
tingency table, show that formula (12.2), disregarding the term N/2, is 
equivalent to formula (12.1). 

10. In an experimental study, 20 retarded children were divided at random 
into four groups, and each group was trained by a different method. 
The ratios of postexperiment scores to preexperiment scores on a check- 
list of self-sufficiency are shown below. Are the differences between 
methods significant at the 5 per cent level? 


METHOD 
A B € D 
4.3 1.5 2.0 1.2 
2.0 1:2 2.7 2.1 
4.5 23 2.8 3.1 
35 2.3 2.9 1.5 
2.6 2i 3.4 1.0 


11. To see whether different individuals wrote more imaginatively about 
some topics than about others, 9 college freshmen were asked to write a 
short composition about each of 7 different topics. The compositions 
were graded by three different judges in respect to originality or un- 
usualness on the basis of a ten-point scale. The average grades, tO 
nearest whole numbers, are shown below. Test for differences between 
compositions and between subjects at the 5 per cent level, using the 
rank test for related samples. 


COMPOSITION 
FRESHMAN A B Ç D E F G 
1 4 4 7 6 3 $ 6 
2 2 6 4 4 5 6 3 
5 6 6 7 d 8 5 6 
4 4 5 5 5 9 8 8 
5 1 2 2 4 4 2 4 
6 6 6 6 7 7 6 8 
7 2 2 3 5 5 4 4 
8 2 2 3 1 1 2 4 
9 4 4 5 3 5 5 6 


-— — — —. 
— — .DA. 


<< |== — — 


a -——— ———— 


N 


18. 
19. 


. Eells, K., et al. Intelligence and Cultural Differences. 
. Eisenhart, C., M. W. Hastay, and W. A. Wallis. 


. Fisher, R. A. “On the ‘Proba 


. Flanagan, J. C. “General Consi 


. Friedman, M. “The Use of Ran 


REFERENCES 


. Aiken, H. H., et al. Tables of the Cumulative Binomial Probability Dis- 


tribution. Cambridge, Mass.: Harvard University Press, 1955. 
American Psychological Association. “Technical Recommendations for 
Psychological Tests and Diagnostic Techniques,” Psychological Bulletin, 


Volume 51, Supplement, 1954. 
Bartlett, M. S. “The Effect of Non-Normality on the r-Distribution," 


Proceedings Cambridge Philosophical Society (1935), 31: 223-231. 


. Berkson, J. “Smoking and Lung Cancer,” American Statistician (Octo- 


ber, 1963), 17: 15-22. 
Boneau, C. A. “The Effects of Violations of Assumptions Underlying 


the 1 Test," Psychological Bulletin (1960), 57: 49-64. 

Brigham, C. C. A Study of Error. New York: College Entrance Exam- 
ination Board, 1932. 

Cochran, W. G. “Some Methods for Strengthening the Common x? 
Tests,” Biometrics, 10: 417-451. 

Davidoff, M. D., and H. W. Goheen. “A Table for the Rapid Deter- 
mination of the Tetrachoric Correlation Coefficient,” Psychometrika 


(1953), 18: 115-121. 
Chicago, Ill.: 


University of Chicago Press, 1951. 
Techniques of Sta- 


tistical Analysis. New York: McGraw-Hill, 1947. 
ble Error' of a Coefficient of Correlation 


Deduced from a Small Sample," Metron (1921), 1: 1-32. 

. “On the Mathematical Foundations of Theoretical Statistics," 
Philosophical Transactions of the Royal Society of London, Series A 
(1922), 222: 309-368. 

*Theory of Statistical Estimation, 


Philosophical Society (1925), 22: 700-725. 
“Applications of ‘Student’s’ Distribution," Merron (1925), 


" Proceedings Cambridge 


5: 90-104. 
. Statistical Methods for Research Workers, 


Oliver and Boyd, Ltd., 1950. 


11th Ed. Edinburgh: 


derations in the Selection of Test Items 
g the Product-Moment Coefficient of 


and a Short Method of Estimatin 
on,” Journal of 


Correlation from Data at the Tails of the Distributi 
Educational Psychology (1939), 30: 674-680. 
ks to Avoid the Assumption of Normal- 


ity Implicit in the Analysis of Variance,” Journal of the American Sta- 
tistical Association (1937), 32: 675-701. 

Galton, Francis. Natural Inheritance. London: Macmillan, 1889. 
Guilford, J. P. Fundamental Statistics in Psychology and Education. 


New York: McGraw-Hill, 1956. 


310 / Statistics in Education and Psychology 


20. 


21. 


33. 


34. 


35. 


36. 


37. 


38. 
39. 


40. 


Hotelling, H. “The Selection of Variates for Use in Prediction With 
Some Comments on the General Problem of Nuisance Parameters," 
Annals of Mathematical Statistics (1940), 11: 271—283. 

Huck, F. T. The Predictive Efficiency of the American Dental Association 
Aptitude Tests and Predental Grades. Unpublished Doctor's Study 
(University of Pennsylvania, 1957). 


. Huff, D. How to Lie with Statistics. New York: Norton and Company, 


1954. 


- Johnson, P. O. Statistical Methods in Research. Englewood Cliffs, 


N.J.: Prentice-Hall, 1949, 


- Kelley, T. L. Fundamentals of Statistics. Cambridge, Mass.: Harvard 


University Press, 1947. 
. The Kelley Statistical Tables. New York: Macmillan, 1938. 


- Kendall, M. G. Rank Correlation Methods, 2nd Ed. London: Griffin, 


1955. 


. Kruskal, W. H., and W. A. Wallis. “Use of Ranks in One-Criterion 


Variance Analysis," Journal of the American Statistical Association (1952), 
47: 583-621. 


. Kuebler, M. A Critical Study of the California and Cowan Personality 


Tests. Unpublished Master's Study (University of Pennsylvania, 1950). 


. Kuder, G. F., and M. W. Richardson. “The Theory of the Estimation 


of Test Reliability," Psychometrika (1937), 2: 151-160. 


. Lindquist, E. F. Educational Measurement. Washington: American 


Council on Education, 1951. 


- McCall, W. A. How to Measure in Education. New York: Macmillan, 


1922. 


. Mueller, C. G. “Numerical Transformations in the Analysis of Ex- 


perimental Data,” Psychological Bulletin (1949), 46: 198-223. 

Natrella, Mary G. "The Relation Between Confidence Intervals and 
Tests of Significance—A Teaching Aid," The American Statistician 
(February, 1960), 14: 20-22. 

Otis, A. S. Statistical Method in Educational Measurement. New York: 
Harcourt, Brace & World, 1925. 

Peters, C. C., and W. R. Van Voorhis. Statistical Procedures and Their 
Mathematical Bases. New York: McGraw-Hill, 1940. 

Reichmann, W. J. Use and Abuse of Statistics. New York: Oxford 
University Press, 1962. 

Rulon, P. J. *A Simplified Procedure for Determining the Reliability of 
a Test by Split-Halves," Harvard Educational Review (1939), 9: 99-103. 
Science (December, 1963), 142: 1529. 

Smith, J. G., and A. J. Duncan. Elementary Statistics and Applications. 
New York: McGraw-Hill, 1944. 

"Student" (W. S. Gosset). The Probable Error of a Mean," Biometrika 
(1908), 6: 1-25. 


33. 


34. 


. Wallis, W. A., and H. V. Roberts. Statistics: Á New Approach. 


References / 311 


. Tate, M. W. “Operationism, Research, and a Science of Education,” 


Harvard Educational Review (1950), 20: 11-27. 

. Statistics in Education. New York: Macmillan, 1955. 

. “Statistical Reasoning: Inference and Sample Size," Journal of 
the Indian Medical Profession (1962), 9: 4153-4157. 

. and R. C. Clelland. Nonparametric and Shortcut Statistics. 
Danville, Ill.: The Interstate Printers and Publishers, 1957. 

. and Barbara Stanier. "Errors in Judgment of Good and Poor 
Problem Solvers," Journal of Experimental Education (1964), 32: 371-376. 


. Thorndike, R. L., Personnel Selection. New York: Wiley, 1949. 


“Regression Fallacies in the Matched-Group Experiment,” 
Psychometrika (1942), 7: 85-102. 


. Walker, Helen. “Testing a Statistical Hypothesis," Harvard Educational 


Review (1939), 9: 229-240. 
. Mathematics. Essential for Elementary Statistics. 
Holt, Rinehart & Winston, 1951. 


. and J. Lev. Statistical Inference. New York: Holt, 1953. 
Glencoe, 


New York: 


Ill.: Free Press, 1956. 


2. Wilcoxon, F. "Individual Comparisons by Ranking Methods," Bio- 


metrics (1945), 1: 80-83. 

Wilson, E. B., Jr. An Introduction to 
McGraw-Hill, 1952. 

Yule, G. U., and M. G. Kendall. An 
tistics. London: Griffin, 1950. 


Scientific Research. New York: 


Introduction to the Theory of Sta- 


Table 
Table 


Table 


Table 
Table 


Table 
Table 
Table 
Table 
Table 


A. 


D. 


E. 


É 
J. 


APPENDIX 


Admission Data and First Semester Performance of 146 
Liberal Arts College Male Freshmen 


. Normally Distributed Scores of 400 Individuals, 200 of 


Whom Have Characteristic A and 100 Characteristic B 


. Proportion of Total Area Under the Normal Curve Between 


Mean Ordinate and Ordinate at Given z Distance from the 
Mean 

90 and 98 Per Cent Sampling Limits of g, and ga for Samples 
of Various Sizes from a Normal Population 

Values of z, for Given Values of the Product-Moment Coef- 


ficient of Correlation 


F. Values of 1 Corresponding to Given Probabilities 
G. 
H. 


Values of x2 Corresponding to Given Probabilities 
A Table of Random Numbers 

Squares and Square Roots 

Squares, Square Roots, Reciprocals: 1-99 


Answers to Selected Exercises 


| 8 e8 82 S6 £6 86 
e S'99 IL 
e 908 12 98 68 9; 
I 168 36 
e TL c9 98 86 S8 
ur 699 gL 
w Of T9 18 06 22 
9 1°99 S0, 
WU g' cL 
e r£, L 86 18 LL 
I 8'£l £9 18 18 T9 
e 9°89 98 68 S8 I8 
u pe, 89 
I 98, 69 
€ £'79 39 
— —  — P À 
TIME Pgs 
" š t E Q d : Z 
¡IONVINUOAYUAA oNOILVNINVXa 
ADATTO JONVULNA pNOILVNINVXA 
WALSANAS LSUIA ADATIOD ¿SLNIOAMU ALVIS HHOA MAN 


NIWHASIYA WIVW HDTTIOD SLUF IFNASIT 


Omooo moa«cmnmm VOUS 


2ONILLVU 
"NOJA 
-OIOOS 


$6 — ec/1l wma | PIO 
c£ 19/9} 3PBALIJ| ELO 
68 T0T/ZI madl czIO 
L6  £p/Z omqnd| IIO 
c6  6b3/03 . ?uqnd| OTO 
Eb 031/69  2IBALIJ| 600 
£9 6b3/3p  HIM¿| 800 
16  L€£/OI omqnq | 200 
66 &b/T 9JBALI | 900 
98 9b/L omquq| S00 
98 L0I/SDL HN | +00 
LL  SI/T  ƏWaud| £00 
9, 61/$  eo^ud| z00 
99 S9S/T6T qna| I00 
I? 68/9  919^ug| 000 
vlod  $XMNVWHU GHGNALLY 


SSV'IO 


AJAL 


TOOHOS AUVANODAS 


97I JO 3ONV WWVOJJ33d YALSAWAS LSYUIA ANF VIVA NOISSIWGF 
F TISV.L 


314 


E < Eo ë e Ë c — c“ c B Hs 


S ums d.c 


621 $9 
8 £9 219 
6'6 98 
869 OL 
6°SL 09 
DEL SVL 
c c8 99 
I 89 $9 
£28 à S8 
LVL c9 
c t8 IL 
9°69 €9 
6°69 OL 
UvL €9 
6 6L SS8 
999 c9 
6:89 I9 
€ 9S ¿LS 
vrs $9 
L GL [I9 
E EL 99 
€ £9 09 
8 I8 c8 
8 09 S88 
SES FL 


88S 16S 
€vs 295 
S6€ ELS 
OLS +S9 
GES VOS 
16V 68b 
999 909 
19p 9IS 
669 Vcl 
969 PbS 
61S 961 
8S 66) 
EFt  1v9 
I£9 36€ 
8S  6€b 
89S 00S 
88S 80S 

Lb T6} 
GLb 239 
LLb 6S 
O19 969 


uw < 


w~ 


C1 


se <j O E 


-68 


^w TA 


48 


68 


08 


£8 


I8 


L6 


I8 


88 


£8 


SL 


S8 


88 


68 


I6 


"4-Oma mmmmo momoo MAA «moonm 


6-81 
OI-LI 
LESI 

S-8T 

£-8I 


LUCI 
6-LI 
E-E 

CESE 

S-6I 


v8 
69 
GL 
66 
88 


cv/8 


I9c/Sv€ 


L12/6 
LS1/S8 
0c/T 


19/81 
€12/09 
9yc/9c 

9v/1I 

69/01 


cIv/96 
18/2 
06/91 
09/9 
LL/9 


€1/P 
8b/€ 
66/08 
£0I/0I 
89/1I 


008/6F 
gI/S 
PE/OI 
30S/6€ 
0&/P 


ONIX | 680 
qad | 8£0 
9T8^Hq| ¿SO 
918AUd| 9€0 
MN | SEO 
oHqng| HEO 
onqng| EEO 
md | sed 
SPA | TEO 
qmq 0€0 
qad | 630 
Sqn | 830 
qad | 230 
“NFAT | 930 
qmd | S30 
VAT | #30 
qaa | €30 
paud | 330 
aqad) IGO 
9j38AUq | 030 
amd] 610 
9384Hd | 810 
MAPA] IO 
onqnd | 910 


938AMq] STO 


`LO V 


LIS So 3L 
"v 9'IL 9S 09 
| £99 79 19 
Uu Té 79 TL 
9 S 88 S 68 
I $€ 0S Ses +9 
BOT 22 OL 
e TPL IZ 92 
Uu ssl 08 02 
e L8 8L $8 
a= SaL 68 gë 
9 [6L 19 68 
u 962 08 SL 
| 95 TO es 
B c £8 $06 98 
u é tL 99 08 
[ 698 S06 68 
m > uU "jt p 
H < B Z > 
SHE 8 2 
/39NVWMOAMSId 
SOM'TIOO 


UALSAMAS LSMIII 


aNOLLVNINY Xa 
SONVWULNSH 
S93 TIOO 


9°88 98 08 86 68 08 


06 78 99 S8 98 


682 8L S8 LL 38 S9 


88 G6 S8 


<DA<< Uca mooorm A 


"NOJA 


-O190S 
pNOLLVNINY X3 ne 


QSLNSOSM ALVIS HOA MAN 


(panunuo)) ¥ 14V. 


JƏNLLVU 


IBI | S8 901/91 nand | 9S0 


6-91| 76 V/Z MANI 


92 991/0% manda 
96  cUI Paud 
I6 918/SL omqnq 
S£ OS/EE  9]9Anq 
£c SL/86  9j8Auq 


£9 O&b/PL MN 
18 88/21 qma 
£9 vb£/83L qmd 
£9  SI/9 BANG 
SS  9b/IZ  918AUq 


€9 1/80  9IBALIGJ 
88 ZSI/6T qna 
I€ PSZ/9LT IBAN 
18 996/88 qna 
89S3/2T Mda 


p»HOd 


SSVTO HdAL 


TOOHOS AUVUNODAS 


DANVY GHYANALLV 


— — 


— 


558 


S36 


$29 


— 


C4 


-88 


68 
18 


86 
s9 
BIS 
$6 
96 
06 
68 
8L 


96 


82 


$8 
68 


98 
82 
SL 
99 
T6 
T6 
v8 
LL 


88 


08 


0L 
T6 


66 
08 
66 
06 
86 
96 
08 
$8 


68 


66 


08 
£6 


€6 
08 
88 
06 
66 
96 
68 
06 


T6 


16 


SL 
S8 


66 
68 
v8 
88 
96 
66 
18 
68 


I6 


06 


I-8I 
0- Tc 
0-8I 
£-8I 
6-21 


FI 
IT-ZI 
S-LI 


00I 


oor 


6? 
SL 
88 


LI/€ 
36/08 


vL/3T 

9b/v 
L0€/92 
LOT/TL 
918/LST 


08/T 
vLZ/ZV 
89/ST 
cv/96 
02/6 


evU/T 
LST/66 
00€/3 
683/83 
6S1/S2 


69/01 
61/1 
St/€I 
€12/0L 


03/31 
99/2T 
38/01 


9JBALIJ 
9JBALIJ 


9JBALIJ 
ILANA 

omqnq 
Ə18ALId 


qnd 


onqnd 
onqnd 
onqnq 
9JBALIJ 


amd 


onqnq 
9]8ALIq 
onqnq 
onqnq 
onquq 


onqnd 
9JBALIJ 
9JBALIJ 
amd 
aqad 


9JBALIJ 


650 


aqvalig | 9S0 


onqnq| 250 


EZZ‘ SI9 809 D 
01 SIP a 
SOOT ZIS €6V a €;  SUZI SƏwxuq| 260 
I£0'1 OSS I8P a SS  £Ll/6L *YPBALIJ| 960 
L83'1 919 129 a c6 = OU/I omqnd| S60 
060'T 609 T8} q I6 0€v/6€ maj 760 
6801 86S T6P g 6L O&b/26 X 9?ugqnd| £60 
£L0'I SOS 89S 2 c8 8€1/9% Əarqnd| z60 
SPOT ?P8S TOP a sg Zh omqud| 160 
IIHI PSL L99 ga 66  0€/€ 9JBALIJ| 060 
c9c'I1 S09 299 e 19 893/06  MNqng| 680 
616 66% 03? í 6S 6S1/S9 Ənqnqd| 880 
V 98 013/36 ?uqnd| 280 
9IS'I PSS ¿99 ga P9 9II/ZP  9IBALIJ| 980 
916 12S S6t g 18 29€/6p 9?"qng| S90 
96T'T 889 80S e 98 ZbI/0Z Madaj $90 
09€'I TOL 929 ga £6 OII/8 ong | $80 
L06 Zh S9} g 0c/e£c | Omqnq| c980 
vlod  9MNVWU AAANALLY 
SSV'IO AdAL 
29NILLVH ON 
¿IODNVIMUOAUAA ¿NOILVNTAV XI cim ca 
ADATIOD HAONVULNA pNOLLVNINV XA — 


WALSANAS LSUIA X353 TIOO SINIOIAU ALVIS WHOA MAN 


TOOHOS AUVUANOJAS 


(panurnuo)) F TIAFI 


318 


CS — — od — 


e 6 E 


LL 
SLL 
TZ 
8L 
99 


$99 
$92 
S66 


$58 


S28 


ST8 


528 


92 
OL 
$8 
98 
69 


8S 
69 
16 
18 
88 


66 
89 
£L 
8L 
09 


8L 
TL 
LL 
68 
68 


$9 
09 
£9 
88 
Gl 


GILT 
6gc'I 
696 

e£cl'I 
600°T 


003'T 
960'I 
9IC'I 
9T0'I 
IST'I 


T9TT 
Z£0T 
6231 
LZ0'T 
136 


L8I'I 
VLUII 
VIII 
S9c'I 
9LT'I 


180'I 
£06 
Scl'I 
8LV'I 
268 


SE9 LLY 
009 6€9 
88T LY 
OES £€6S 
06}  6IS 
£9S LE9 
EPS ESS 
S9p ISL 
887? sss 
809 EPS 
PES 139 
€9S 69v 
08S 669 
6SS 89v 
Ic? 00S 
£69 T6V 
GLS 309 
9271 889 
189 86S 
6IS S9 
EPS 8€S 
SF LS} 
I8}  vr9 
LEL I#L 
IES} 99} 


< < 


"16 


$8 


$8 
78 


-18 


-88 
"62 


.06 


“$8 
66 


08 


06 


68 


$8 
68 


L8 


16 
c8 


L8 


c8 
86 


98 


06 


S8 


06 
s8 


68 


$8 
06 


06 


96 
66 


06 


T6 


£8 


78 
88 


68 


68 
18 


96 


68 
I6 


6L 


16 


68 


16 
€l 


6L 


66 
S9 


68 


68 
£6 


TL 


08 


S8 


L9 
06 


TL 


98 
6L 


L8 


£L 
69 


08 


r3 
0-21 
11-21 
0-61 
v-81 


9-81 
s-81 
L-l6 
£-8I 
9-LT 


0-03 
£-8I 
9-81 
0-81 
6-8I 


L-LI 
OI-—LI 
L-LI 
£-8I 
LI-LI 


6-81 
0-81 
6-81 

II-ZI 
0—8I 


96  S£T/9 
88  8€1/21 
€6 32/2 
PE SS/LE 
66 JL£/Il 
S6  vOl/9 
6L 6S1/PE 
98  16/€&V 
YL 891/SP 
99 LOT/ST 
S9  9€/€1 
89 1212/88 
0S £/6 
06  90l1/II 
SL  YOl/9c 
c9  cUS 
0L 913/99 
S6  9€1/L 
001 9€Z/T 
66 TSg/S 
OL  0£b/831 
LS  2€€/bpT 
€¿ 583/89 
£9 212/08 
99  P9£E/9S 


onqng 
qmd 
onqng 
IPBA A 


onma 


qmd 
onqnq 
onqnd 
qnd 


919Aliq 


9JBALIJ 


onquq 


9JBALIJ 


SIM 
SEQ 


HANA 
qnd 
onqng 
ANA 
NANA 


onqnq 
onqnq 
orqanda 


9JBALIJ 
oTa] 00T 


Tcl 
EST 
GSI 
ISI 
OCT 


611 
8II 
LIIL 
STL 
SIT 


VII 
€I 
GIL 
III 
OIL 


601 
801 
LOT 
90T 
SOT 


FOI 

£01 
GOI 
IOI 


319 


9 9'69 79 
wu $'09 09 
| 898 I8 
Uu stl Sc8 
I 062 19 
[te IS 
I € 68 88 
acest £2 
I 629 99 
“TUI H 
9 608 cL 
u coL S 


g 
uw 
N 
t- 
< 
© 


et C) a CO © 


AVATIOO 


89 
19 
c9 


GL 
BL 
€L 
c6 
19 


8S 
£6 
OL 
09 
08 


sTONVNUOAUA 


YWALSANAS LSUILA 


LVS 


LYN 


oNOILVNINVXa 


GON VYING 


AOATTIOO 


0 16 36 06 06 
S'88 06 16 S6 
€ 98 06 06 S8 
€ 98 c9 38 6 
6 S6 16 06 86 
0 I6 06 £6 c6 
9 I8 € SL b; 
879 88 S98 68 
c I6 c6 8L S6 


G6 16 
06 8l 
c8 06 
€8 98 
T6 16 
c6 18 
89 78 
88 
T6 88 


nm = Un 
2 36062 E g 
Y PO A d i 
pNOILVNINVXA 


QSLNSOMMH ALVIS MYOA MAN 


AMANO mm-«mo oornom mo^ 


o39NLLVM 


'NOO'I 
-OIOOS 


(ponuiuo;)) F WISV.L 


Fd) UL 
MAN Y OPY 


661 
8€T 
LET 
9ST 
SEI 


FSI 
EET 
GEL 
ISI 
OET 


6cI 
8I 


"ra 


9cI 
ScI 


L-BL\ 88 96/91. IWMIAN TVL 
Z-1Z\ OF 871/06 
L-81\ 96 tll 
16 | LUS qda 
£6  Scl/6 onqnq 
v6 667/21 qda 
66  OV/I qda 
16 &S2/8 aqna 
LL  T3€/€2 qnd 
onqnq 
T6 | 61/9 9]18ALIq 
8b  6S/I£  9IBANA 
9, 92/21 omqnq 
9,  vre/gg ama 
18  1I£/c9 omqnq 
TE  v€9c/9L] 919e^uq 
16  €01/P omqnq 
OL  SL/g£G  9]eAuq 
vlod ¿MNVY AAANALLV 


qS V 


SSY'IO 


AdAL 


TOOHOS AUVUANODAS 


321 


—..Q.Q....................... U U 
'uorjedrorjed 319911 7 tuoryedionied 0201949 “o uorjedrorjred. yonu ‘w :sontAnoe juopnjg "sjoafqns Jre uo poseq 

89501949 lojsoulog '1eururos ysu [eroods 9jeorpur $' YA sopei2 qsI[3051 'sopei2 ys Su pue o3enZuvw[ USISIO Y y 
"OOT “q `S ‘00S “uva 

Sunou LYW PUB LVA [peuoneN LVW + LVA ‘LVS :epninde [eoreuroqyeur yy *opnande peques ‘LVA o 
'S9100S UBIJU popunoz əy} 

jo usour əy} 03 [enbo Ajrensn 30u st pue spjoy [[e ur so100s Je Jo ueour 94] SI 929194 Y `Sp|9J ur so100s UBIJU pepunoy p 

'sjuo1ed q30q jo syuouUTe}}8 [euorjeonpo pue 194783 jo uoryednooo uodn paseg , 

‘a8a][00 ur 191səurəs js1g JO puo 78 suquou pue sreoK Ul 93V q 


"VET `d “(p"p) ejnuuj 39S "N//(z/T + Y — AD001 = YOd ‘68 JO sse[o 
8 UI EG JO xu: e sojvorput 69/ec ‘SNY, 'sse[o ur AJ Joquinu Aq poptarp [8npráArpur JO ¿7 xuui 10 uorjsod : UBA SSBID > 


I 1°98 SI6 98 | IZZI 0L} ISL] 0 T6 26 T6 96 v6 68 O S-8T| 86  OTVI/T Cad! SPI 
" TEL 99 2.1316 £6P GIF] V'I98 T9 08 I6 92 I. O c-IG ONT | PEL 


—n nn m r r A ¿AAA 


TABLE B 


NORMALLY DISTRIBUTED SCORES OF 400 INDIVIDUALS, 200 OF 
WHOM HAVE CHARACTERISTIC A AND 100 CHARACTERISTIC B 


(Data fictitious) 
A A AAA 
g B Ë Ë Ë Ë Ë 
É 8 [a8 2125 2/38 2/28 B| 38 EE 31.5 3 
EEERIEHERHEEERHEHHHEHHHIEHEHEHIEHHHELHEHE: 
z8SE|8386|558 E)#5 86/858 6/858E]258 5 HE 
H29t5|228g 2/828 <|22 9 <|22 9 <| 2Z Š B23 2|23z 8 < 
000 30 40 A B 200 50 A|250 60 A| 300 50 [35041 B 
00140 B |051 45 A |101 23 B| 151 44 B| 201 44 B| 251 41 301 43 351 36 A 
00226 A |052 33 B |102 43 A | 152 37 A | 202 32 B| 252 38 A | 302 28 352 49 A 
003 36 A |053 35 103 33 153 49 A| 203 40 A| 253 43 A | 303 56 A | 353 29 B 
00445 A |054 38 A |104 47 A | 154 40 A | 204 35 A | 254 22 304 30 A | 354 45 A 
005 37 B |055 11 105 40 A | 155 50 B | 205 42 A | 255 32 305 48 355 39 
006 41 056 36 A |106 28 A | 156 33 206 34 A|256 39 A | 306 39 B | 356 47 A 
00751 A |057 24 A |107 44 B | 157 39 A | 207 27 257 37 A | 307 23 357 24 A 
00835 B |058 37 B |108 30 |158 22 B|208 49 A|258 28  |308 31 A| 358 33 A 
00925 |059 51 109 38 B | 159 20 A | 209 37 B | 259 34 A | 309 38 B | 359 49 A 
01031 A |060 41 A |110 45 A | 160 31 A | 210 46 260 40 B | 310 40 360 21 B 
01144 |061 30 111 37 A| 161 42 B|211 25 A | 261 29 B | 311 25/ A | 361 39 A 
012 44 A | 062 32 A |112 25 A | 162 35 212 46 A |262 41 A |312 37 B | 362 40 A 
01338 B |063 45 113 42 A | 163 47 A | 213 51 263 38 B | 313 50 A | 363 32 A 
01448 A |064 38 A |114 27 A | 164 29 A | 214 28 A | 264 44 A | 314 44 A | 364 41 A 
01539  |065 42 B |115 39 A | 165 38 B|215 34 B | 265 29 A | 315 27 365 53 
016 29 B |066 31 116 32 A | 166 52 216 33 266 53 A | 316 43 A | 366 45 B 
01719 B |067 45 A |117 32 A| 167 32 A | 217 41 B | 267 35 B | 317 37 367 37 
018 15 068 39 A |118 41 168 44 B|218 23 A|268 46 B | 318 40 B | 368 26 A 
01948 A |069 25 A |119 21 A | 169 51 A |219 45 A | 269 30 A | 319 42 B | 369 30 A 
020 42 070 38 A |120 43 A | 170 33 220 38 A | 270 31 B | 320 33 B | 370 47 A 
02133  |071 42 B |121 37 A|171 13 B| 221 45 A|271 49 A | 321 20 B | 371 40 
022 46 A |072 52 122 28 B | 172 46 B | 222 54 B | 272 24 A | 322 31 372 49 A 
02332  |073 30 A |123 39 B| 173 34 A | 223 28 A | 273 52 323 45 A|373 34 B 
02449 B |074 41 B |124 41 B | 174 43 A | 224 31 274 39 324 50 374 42 A 
02533  |075 47 A |125 34 A| 175 30 A | 225 39 B | 275 57 A | 325 36 375 27 
026 54 B |076 34 126 45 176 45 A| 226 45 A | 276 31 A | 326 41 B | 376 27 A 
02739 |077 44 127 36 A | 177 50 A| 227 48 |277 36 B| 327 47 B |377 35 B 
028 27 078 37 A |128 26 178 36 B|228 26 A|278 33 B | 328 26 B | 378 41 A 
02935 [079 26 129 35 A | 179 28 A|229 35  |279 42 A | 329 35 B | 379 38 
030 47 A |080 32 130 43 A | 180 46 230 48 A | 280 33 A | 330 40 B | 380 44 A 
03142 A |081 39 131 47 A | 181 57 A | 231 36 A | 281 53 331 48 381 21 A 
03255 A |082 29 B |132 34 B | 182 32 232 42 B|282 41 A |332 32 A | 382 51 A 
033 32 A |083 36 B |133 43 A | 183 40 A | 233 53 A | 283 45 B | 333 46 383 39 A 
034 44 B |084 53 A |134 56 B | 184 39 234 31 284 27 A|334 39 A|384 34 A 
03552 |085 28 A |135 42 A| 185 43 B|235 44 B| 285 51 A | 335 42 A | 385 38 B 
03619 |086 48 A |136 69 A| 186 48 B | 236 57 A|286 42 A | 336 16 A | 386 50 
037 30 A |087 38 137 41 A | 187 33 A|237 34 A | 287 47 337 54 B| 387 55 A 
038 43 A |088 49 B |138 52 A| 188 46 B|238 42 B| 288 36 A |338 38 B | 388 36 A 
03955 B |089 50 A |139 37 A | 189 48 239 49 A |289 44 A | 339 58 389 54 B 
040 55 A |090 35 B |140 54 190 48 A|240 37 A | 290 38 A | 340 29 A | 390 46 
04123 |091 58 A |141 46 191 24 B|241 61 A|291 54 A| 341 41 B | 391 49 A 
04249 A |092 46 A |142 51 B|192 47 B | 242 48 A | 292 so B | 342 56 A | 392 22 
04343 B |093 59 A |143 34 B | 193 59 243 31 A | 293 34 343 37 393 43 ^ 
044 52 094 36 144 47 B| 194 40 A | 244 18 A | 294 47 A | 344 51 394 55 B 
045 36 A |095 61 A |145 58 195 64 B | 245 43 B| 295 41 345 62 A |395 35 A 
046 37 A |096 47 146 42 A | 196 65 246 51 A|296 46 B 346 43 A | 396 63 B 
04738 B |097 56 147 17 A|197 35 A |247 40 A | 297 43 B | 347 40 397 44 ^ 
04846 |098 29 A |148 67 A | 198 60 248 53 A|298 53 B|348 52 B| 398 52 ^ 
049 50 A |099 40 149 31 A| 199 57 B | 249 35 A | 299 36 B | 349 48 399 33 


TABLE C 
PROPORTION OF TOTAL AREA UNDER THE NORMAL CURVE 
BETWEEN MEAN ORDINATE AND ORDINATE AT GIVEN s 
DISTANCE FROM THE MEAN 


mA 
SECOND DECIMAL PLACE IN Z 


š OR z 
.00 .01 02 .03 04 05  .06  .07 .08 .09 


.0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359 
"0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753 
"0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141 
“1179 11217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517 
"1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879 


.1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 . 
"2957 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549 
"2580 .2611 .2642 .2673 .2104 .2734 .2164 .2794 .2823 .2852 
2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133 
"3159 3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389 


.3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .9599 .3621 
3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830 
"3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015 
“4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177 
"4192 [4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319 
.4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441 
"4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545 
"4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633 
"4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706 
"4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767 
4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817 
"4821 .4826 .4830 .4834 .4838 .4042 4846 .4850 .4854 .4857 
"4861 .4864 .4868 .4871 .4875 .4878 4881 .4884 .4887 .4890 
4893 .4896 .4898 .4901 .4904 "4906 .4909 .4911 .4913 .4916 
4918 .4920 .4922 .4925 .4927 "4999 .4931 .4932 .4934 .4936 
.4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952 
14953 .4955 .4956 .4957 .4959 "4960 .4961 .4962 .4963 .4964 
"4965 .4966 .4967 .4968 .4969 "4970 .4971 .4972 .4973 .4974 
14974 .4975 .4976 .4977 .4977 "4978 .4979 .4979 .4980 .4981 
"4981 .4982 .4982 .4983 .4984 "4984 .4985 .4985 .4986 .4986 

4989 .4989 .4990 .4990 


.4987 .4987 .4987 .4988 .4988 .4989 . 

"4990 .4991 .4991 .4991 .4992 "4992 .4992 .4992 .4993 .4993 
"4993 .4993 .4994 .4994 .4994 .4994 .4994 . š 

"4995 .4995 .4995 .4996 .4996 "4996 .4996 .4996 .4996 .4997 
"4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 . 


.4998 
.49997 
.499997 


323 


peppe Arara NNUNNN NNNNN Hee Bee 
NEO CONAN ALNHO bonan ALNHO bonan awhe 
=o 


a 


324 


TABLE D^ 
90 AND 98 PERCENT SAMP LING LIMITS OF gı 
AND g2 FOR SAMPLES OF VARIOUS SIZES FROM 
A NORMAL POPULATION 


SIZE 


oF 
SAMPLE 90% 98% 
rl AAA 
50 +.53 +.79 — = 
100 +.39 4.57 2.35-3.77 2.18-4.39 
125 +.35 +.51 2.40-3.70 2.24-4.24 
150 +.32 +.46 2.45-3.65 2.29-4.14 
175 +.30 +.43 2.48-3.61 2.33-4.05 
200 +.28 +.40 2.51-3.57 2.37-3.98 
250 4.25 #Æ-36 2.55-3.52 2.42-3.87 
300 +.23 +.33 2.59-3.47 2.46-3.79 
350 +.2 +.30 2.62-3.44 2.50-3.72 
400 +.20 +.28 2.64-3.41 2.52-3.67 
500 +.18 +.26 2.67-3.37 2.57-3.60 
600 +.16 +.23 2.70-3.34 2.60-3.54 
700 1.15 +.22 2.72-3.31 2.62-3.50 
800 +.14 +.20 2.74-3.29 2.65-3.46 
900 +.13 +.19 2.75-3.28 2.66-3.43 
1,000 +.13 +.18 2.76-3.26 2.68-3.41 
1,200 *£.12 +.16 2.78-3.24 2.71-3.37 
1,600 +.10 +.14 2.81-3.21 2.74-3.32 
2,000 .09 35.13 2.83-3.18 2.77-3.28 
5,000 +.06 +.08 2.89-3.12 2.85-3.17 


UM A ee. 

° Table D is adapted from Table IV of E. S. Pearson's 
“ A Further Development of Tests of Normality,” copy- 
right 1930 by Biomelrika, 22:239-249, and used by 
permission of author and editor. 


<,`.<——— s -- ox<x>¿ə-=—BÀ .>  .  —əb—` 7 +w w+uw > 


TABLE E* 
VALUES OF z, FOR GIVEN VALUES OF THE PRODUCT- 
MOMENT COEFFICIENT OF CORRELATION 


P Zr P Zr r Zr r Ze 
.00 .00 .25 .26 .50  .55 75 OT 
.01 .01 .26  .27 .51  .56 .76 1,00 
.02 .02 .27 .28 .92 .58 .17 1.02 
.03  .03 .28  .29 .53  .59 .78 1.05 
.04  .04 .29  .30 .54  .60 .79 1.07 
.05  .05 .30 .31 .55  .62 .80 1.10 
.06 .06 .31 .82 .56 .63 .81 1.13 
.07 . .07 .32 .33 .57  .65 .82 1.16 
.08  .08 .33 .34 .58 .66 .83 1.19 
.09 .09 34  .35 .59  .68 .84 1.22 
-10  .10 .85  .37 .60 .69 .85 1.26 
.11 LI .36  .38 .61 -n .86 1.29 
.12  .12 .97 .89 .62  .73 .87 1.33 
.13 .13 .38  .40 .63  .74 .88 1.38 
.14 .14 .39  .41 .64  .16 .89 1.42 
.15  .15 .40  .42 .65  .18 .90 1.47 
.16  .16 41.44 .66 .79 .91 1.53 
€A 42 .45 -67  .81 .92 1.59 
.18  .18 .43  .46 .68  .83 -93 1.66 
.19 .19 .44 .47 .69  .85 .94 1.74 
.20 .20 .45  .48 -70  .87 95 1.83 
2 .21 .46  .50 -71 .89 .96 1.95 
402 22 .47 .51 .72  .91 .97 2.09 
.23 .23 .48 .52 :78  .98 .98 2.30 
.24  .24 .49 .54 .74  .95 -99 2.65 


MEM E NEM M VEM NNNM. ul 
? Table E is adapted from Table V. B. of Fisher, Statistical 

for Research Workers, published 1950 by Oliver uui Bull 
Edinburgh, by permission of the author and publishers. u " 


A 24 — 


325 


TABLE F* 


DEGREES OF PROBABILITY? 

FREEDOM n .20 .10 .05 .025 .01 .005 .0005 
1 138 308 6.31 12.71 31.82 63.66 636.62 
2 1.06 1.89 2.92 430 6.96 9.92 31.60 
3 98 164 235 318 454 5.84 1294 
4 94 153 213 2.78 3.75 4.60 8.61 
5 92 148 20 257 336 403 6.86 
6 91 144 194 245 314 3.71 5.96 
1 90 142 190 236 323.00 3.50 5.40 
8 89 140 186 231 290 336 5.04 
9 88 138 183 236 2382 3.25 4.78 

10 88 137 181 223 2.76 3.17 4,59 
11 -88 1.36 180 220 2.72 3.11 4.44 
12 87 1.36 1.78 2.18 2.68 3.06 1.32 
13 87 135 177 2.16 265 3.01 1.22 
14 87 134 176 2144 2.62 2.98 1.14 
15 .87 134 1.75 2.13 2.60 — 2.95 1.07 
16 86 134 41.75 212 258 2.92 1.02 
17 86 — 133 174 241 257 290 3.96 
18 86 — 133 1.73 210 255 2.88 3.92 
19 86 — 133 1.73 209 254 2.86 3.88 
20 .86 32 132 209 253 284 3.85 
21 86 132 172 208 252 2.83 3.82 
22 86 132 1.72 207 251 2.82 3.19 
23 86 132 171 207 250 2431 3.77 
24 86 132 171 2.06 249 2.80 3.74 
25 86 — 132 171 206 248 2.79 3.72 
26 86 132 171 206 248 278 3.71 
27 86 131 170 205 247 2.77 3.09 
28 86 131 1.70 2.05 247 2.76 3.67 
29 85 131 1.70 2.04 246 2.76 3.66 
30 85 131 170 204 246 2.75 3.65 
40 85 130 1.688 2.02 242 2.70 3.55 
60 85 130 167 2.00 239 266 3.46 
120 84 — 129 166 198 236 2.62 3.37 
o 94 128 164 196 233 258 3.29 


. e Table F is abridged and adapted from Table III of Fisher and Yates: Sta- 
tistical Tables for Biological, Agricultural, and Medical Research, published by 
Oliver and Boyd, Ltd., Edinburgh, by permission of the authors and publisher, 

è The probabilities correspond to a positive (or negative) ¢ and are doubled 
for a two-sided test. 


326 


eee 


> 
*1031p9 pue 1oqjnv o jo uorsstunad Aq posn pus '6gp-89I:2£ N 
‘Dyg Aq TYGI 1081&doo ,,'uonnqmstq ¿X om jo mutoq oSeyuoo1oq Jo oqqv1,, s,uosduoq] *]N oun19qjv-) uro po3priqw st oquy SL o el 
"V OIL oq poxojoz put on oz [onto v se poivorj oq Aew T — uz Á — ;Xg/^ Amuvnb om ‘og < u doy 
MEE s E 5 : 
L'£S 6'0S 0:1 B'S g'oy gye g'oz  S'Yz —9'0z S'8I 8'91 0'ST 8'el og 
£'ZS 9'6V L'Sy 9'Zp 6t LEE  £'83 ez B°6T 2:11 0:91 € YI Tel 63 
O'IS £'OV SPP £'IP 6'L€ 9'cE €Z LZ  6'8l 6:91 £'ST 9'tt Sz 83 
96v OLY Zieh Tov L'9t SITE £97 LIS ISI Z'91 9:VI [341 8:11 13 
t'8v 9'Sy 6'1P 682 9'SE& POE est 8'03 elt vst 8'er TZI Tu 9% 
$:9v te oor L'LE HE g'o? e vz 6'61 $'9I1 9"vt TEL S'II EMS H4 
9'sy 0'tP tV'6t vot SEE 2383 t'£c 0'61 L'SI 8'er PSL 6°01 68'6 vz 
Sth OTP TBE Z'SE O'ZE Llc e zz T "81 8'vI T'EI Lit [aU 926 ez 
8'3b £'OV 8'"9£8 6'££ 8'0£ 0°92 £ 13 ZLI O'vI £ "31 O'TI vS'6 v9'8 [14 
vv 6'8t S'SE LSE 9°60 6' vc t'0c £'9r $'£I OIL g'or 06'8 £0°8 1% 
0'0P OLE SHE PIE FBS BES £'6I Fi vor 6'0I 656 938 ty'L 02 
9'9t Z'9£ 6'ZE T'0€ Z l L'2c £'8I 9'vl Lit TOT 16°8 £9'L 789 61 
Z'LE B'VE S'I£ 6°82 0'97 9'12 £'LI L'£l 6°01 6£'6 £28 10'L 929 9r 
L'S£ y't£ ZOE 9"23 BIS S'Oc £'9I 8 ¿I TOL 19°83 9S*L v9 OL'S LY 
EVE 0'ct BBS £'9% S'L£Z Y'6l £'SI 611 1£'6 96'L 16'9 18'S vuUS 9r 
8:'ct 9'0£ S'LZ O'Sc t'2c ZBI £I O't ss'8 9z:L 92:9 £3'S 09'v st 
ele L6; 192 L'tG T'I% ULI £'£I g'or 6L'L LS'9 £9°S 99°F to't vt 
8:63 L'LZ LPS Vico 8:61 O'I £'ZI 0£'6 voz 68'S 10'S T'Y Le't £I 
t'9c 39% E'S 0'12 S'8l 8 vI E TL "89 0t'9 £3'S ovv L1S't L0'£ aL 
8'92 LVS 61S L'6l ELT LET £'ODI 852 8s's Ls'y 738 £ sos 09'c 104 
2:sz zg S'02 ESI 0'9I S'ZI E's PL'9 18"? v6't set 98% 9t'z or 
9'E7 L'IZ O'6T 6'91 L'HE VTL ve'g 06'S LUY eee 0173 603 tL'I 6 
0'33 1:03 SI S'SI vtl FOL PEL L0'S 6v't els 813 S9'I yt'I 9 
£'03 S'81 0'9l T O'ZI $0'6 S£'9 Sev £8 7 LVS 69'1 vol 686° L 
S'8l B'I VFL OSL 9'01 PBL SE'S sve 02's $9'1 vol [2 919* 9 
L'9OI T'SI 8'3l TIT $276 £9'9 se? 19% 19 "1 SUI 1£8* pss" [UA s 
OFT fl UII 6F'6 BLL 6's 9£'t z6'1 90°T TIL’ LELA 163" 107" v 
8'ZI £ Il S£'6 I9'2 S2'9 IU v Les ican ves” ose" 91a" ¡SIT LTLO" € 
901 136 BEL 66°S I9'$ LL `Z 6t'1 SS" 11a" gor’ 90S0° 1020: 0010" [4 
88"L £9'9 20'S t$9'€ IL Z€'I ssp" sol” 8sto” £6£00° Z86000* LS1000* €£6£0000* 1 
soo" oto szo oso oor osz 00s* osz’ 006* ose” sz6" 066* s66* u Wodasua 
ALITIGVaGOud 


40 saausaa 


— __ _—___ R .R.A-. C 1 Q 


SJLLITISVHO!Id NIAID OL INIANOISIIIOD ¿X AO SINTFA 
oD W'IHF.L 


TABLE H^ 
A TABLE OF RANDOM NUMBERS 


328 


COLUMN NUMBER 


ROW 


ROW 


24 25 26 27 28 29 30 31 32 


12345678910 11 12 13 14 15 I6 17 18 19 20 21 22 23 


1 


2785940123 2 54 2 60 7T I B U b 9 Il 3 5S $ Ú 3 B5 6 T $ 9$ l1 


1 


2 
3 
4 
5 


anon 
DAN” 
C C = u 
m co xO co 
NO c> e 
Neon 
o7non 
ANNO 
SAW Ww 
e c Hem 
e mto 
CN ti co 
aomnmoo 
er N DO 
HNN 
Nino 
- r= 50 
OV oo = 
~ Ano 
NO wu ct 
OH ON 
“norn 
© ocu 
e m t— A 
in co 
Ha 
HANN 
- cr C. NO 
@ oiv 
NO NO r> cr 
N — O ~ 
NO 
NO Yin 


NO r 


(20287069 2 Z2 35 I1 Ll L 6& 4 9 5 2 2 
08753326423 6 8 3 1 6 50055 T 8 


6 
(i 


924195084066 8 6 243 2 23 14 1.5 L 5 T 6 3 7 9 4 Š 


8 
9 
10 
11 


5 


500674000 1 9 


195415262 9 4 


11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 


5644187238 9 015 9 8B 6 2 29 1 904 8 101353 4 4 


T9 25L9 T.:93 l1 $ 6 ü T+ Y Ü B 5 0 3 B8B 1 12 47 8 59 1717 5 2 


12 
13 
14 
15 
16 
17 


3335395140 8 2 5 Ó 3 5$ 4 6 5 7 2 671 8 9 99 8 909 15 3 


190400995 74 1 5 94 T 6 4826 44 18 8154 3 8 0 


3544120379 1 0 5» 6 2 9 (t & T Ó Ll 1 $ I 2 


9 5 8 4 4 8 6 


2 


2982523932 0 4 90060 42 412 15 13 6 5 5 4 5 T 9 6 6 4 0 


85702577959 3 L T 5 Ü 9 5 E1535 S 21 3 0 090 L RR % tests 


9582433085 3 5 T 5 $$ á 5 9 3 45 450 3 9 Z2 7 11249 1 3 


18 
19 
20 


42542950302 9 T 4 6 2 S 6 9 9 š 6 L4 03 S 9 71 18 0 6 9 


1198480617. 09 79 6 99 406 005 9 65 14 2 04 1 9 


o9 5 6 T t 6 U $41 5 Š 4 5 Ç š 90 8 L 90 Si O @ 3 B Q 


- — 


o 
Q 
* t 
IN ONO ANNAN VO D-Q co OA O — G ceo =R u VIDAD > 
AN IESU DOHA MANS RAR UU E 
N 
o 
NN CO cn N NO OA O — cq — @ Q. co C VAN C @ oc rc ° e 
O: -. 
MH DOROvV NHOMK o N m O OMON | iur 28 
D 4 
We 
NOM ARAN t - QV C CO CO CO cn u cn “F uy c CO Xon DS DS — t° a 3 
ons 
MO moomo An uy O çO cO NO c 90r @ co G SA q ORD EF 
un 
YH tS N C I AO mom — VO t= GQ k cO 00 O meno ND CS G eO B2 
O 
d = 
NO VO u> t- cn Cq pe eR CO XR cq jO < << q DS HORA maana ES 
MA mA VO t uyu HNANA Gq < s À xO MANOS MOVON rt 
a 
“O RODA O O QNO NMDOH 00 00 C0 e. ANNON 32 
eo 
ON e o cq CO 9 an - ON ON u> DO CO e; C CN. CO 10 C OD mM €o — 10 Za 
O 
> N ri CN Co er 
AN < co CO co cn u e a VO co. tO SD cn cn M pO co co ND O OM o5 
o 
wo co 2 
UV FNAN anco DO co -cO pon co nw hat zB 
CO CO < — ON cO Cq e te CO ON cO. QUO oe eO e i NO x NA E Ls 
< > 
NM i$ iD iG O t CG cn co e e MNO RO ANDON eA iu 0 0 ° a 
D 
e Q 
mu 
AN tere op ran nan om CO — co c= t= xo 0 O co O OL 5B 
=: 
Imt ———— wO G O SADA PUNA EE 
omo P aç 
"AO CQ CO u I CO (Oei dino Oc Gq c FORMA NA "m 
— 
ON N 
HA E AO Ls IA 3% 
NARRADA Go 
O HNOK GH mn - nn a NODO Ora FL 
oD a = 
dk mueres sra RI li blc oq m 
HONNONY 2 "= 
DE x 0000 HOCUS ONHADD MAS 32 
uv? rm = > 
SN cn QO cq ON CO rnron OANA DINSA - < O EE 
ont 
`> co E Er co mew o y 
C CON NO c Cq amm ND 35 
ONN co — c° @ S. 
NA cq CO cn Cq CO co < co co u> e coo tO AS S > 
> 
e» co co C4 C1 CQ 12 - 
ka merora am = cq ON VINO = ANN aps pom 
Sa CA co q co xO O Me SN 
E= v CO OV O i i0 e 605 CO. t ND dint zz 
LO rao NADO CO c NO u? 12 tO anc t> AS zu ° = 
Sh cr AAA Se a oot 
MNM HNA Ann 0 ON wen D c5 =E 
SO nyowo cc co u O p. SSS NN & pupa n m ou co = = 
SN Aaron -. nando 00900 e co vo N "c S9 
N - 
S S) agen me oo AA AB ll XU 
N en e aa e = AOCONMS en co = = 
O u Q, Cq co co co r- co o ES 
3 
~ ane E 
Hin o = c3 co sh io `Q sn 5 
ORO <H eR H 
SW LANAS HILA $5999 E 


329 


TABLE I 


SQUARES AND SQUARE ROOTS 


Vn 


1.00000 
1.00499 
1.00995 
1.01489 
1.01980 


1.02470 
1.02956 
1.03441 
1.03923 
1.04403 


1.04881 
1.05357 
1.05830 
1.06301 
1.06771 


1.07238 
1.07703 
1.08167 
1.08628 
1.09087 


1.00545 
1.10000 
1.10454 
1.10905 
1.11355 


1.11803 
1.12250 
1.12694 
1.13137 
1.13578 


1.14018 
1.14455 
1.14891 
1.15326 
1.15758 


1.16190 
1.16619 
1.17047 
1.17473 
1.17898 


1.18322 
1.18743 
1.19164 
1.19583 
1.20000 


1.20416 
1.20830 
1.21244 
1.21655 
1.22066 


1.22474 


Vion 


Vion 


3.16228 
3.17805 
3.19374 
3.20936 
3.22490 


3.24037 
3.25576 
3.27109 
3.28634 
3.30151 


3.31662 
3.33167 
3.34664 
3.36155 
3.37639 


3.39116 
3.40588 
3.42053 
3.43511 
3.44964 


3.46410 
3.47851 
3.49285 
3.50714 
3.52136 


3.53553 
3.54965 
3.56371 
3.57771 
3.59166 


3.60555 
3.61939 
3.63318 
3.64692 
3.66060 


3.67423 
3.68782 
3.70135 
3.71484 
3.72827 


3.74166 
3.75500 
3.76829 
3.78153 
3.79473 


3.80789 
3.82099 
3.83406 
3.84708 
3.86005 


3.87298 


1.22474 
1.22882 
1.23288. 
1.23603 
1.24097 


1.24499 
1.24900 
1.25300 
1.25698 
1.26095 


1.26191 
1.26886 
1.27279 
1.27671 
1.28062 


1.28452 
1.28841 
1.29228 
1.20615 
1.30000 


1.30384 
1.30767 
1.31149 
1.31529 
1.31909 


1.32288 
1.32665 
1.33041 
1.33417 
1.33791 


1.34164 
1.34536 
1.34907 
1.35277 
1.35647 


1.36015 
1.36382 
1.36748 
1.37113 
1.37477 


1.37840 
1.38203 
1.38564 
1.38924 
1.39284 


1.39642 
1.40000 
1.40357 
1.40712 
1.41067 


1.41421 


Vn 


Vion 


3.87298 
3.88587 
3.89872 
3.91152 
3.92428 


3.93700 
3.94968 
3.96232 
3.97492 
3.98748 


4.00000 
4.01248 
4.02492 
4.03733 
4.04969 


4.06202 
4.07431 
4.08656 
4.09878 
4.11096 


4.12311 
4.13521 
4.14729 
4.15933 
4.17133 


4.18330 
4.19524 
4.20714 
4.21900 
4.23084 


4.24264 
4.25441 
4.26615 
4.27785 
4.28952 


4.30116 
4.31277 
4.32435 
4.33590 
4.34741 


4.35890 
4.37035 
4.38178 
4.39318 
4.40454 


4.41588 
4.42719 
4.43847 
4.44972 
4.46094 


4.47214 
| —— as 


vion 


TABLE I (Continued) 


Vion Vion 


1.41421 447214 : 2 1.58114 00000 
1.41774 4.48330 1.58430 $0999 
1.42127 4.49444 1.58745 5.01996 
1.42478 4.50555 E 1.59060 5.02991 
1.42829 4.51664 x 1.59374 5.03984 


1.43178 | 4.52769 

143527 | 4.53872 4 Eres 
143875 | 4.54973 i 5.06952 
144222 | 4.56070 y 5.07937 
1.44568 | 4.57165 4 5.08920 


1.44914 4.58258 1.61245 5.09902 
1.45258 4.59347 1.61555 5.10882 
1.45602 4.60435 1.61861 5.11859 
1.45945 4.61519 & 1.62173 5.12835 


1.46287 4.62601 xX 1.62481 5.13809 


1.46629 4.63681 x 1.62788 5.14782 
1.46969 4.64758 Y 1.63095 5.15752 
1.47309 4.65833 x 1.63401 5.16720 
1.47648 4.66905 x 1.63707 5.17687 
1.47986 4.67974 1.64012 5.18652 


1.48324 4.69042 1.64317 5.19615 
1.48661 4.70106 1.64621 5.20577 


1.48997 4.71169 £ 1.61924 5.21536 
1.49332 4.72229 1.65227 5.22494 
1.49666 4.73286 1.65529 5.23450 

5.24404 


1.50000 4.74342 1.65831 
1.66132 5.25357 


1.50333 4.15395 

1.50665 4.76445 1.66133 5.26308 

1.50997 4.77493 1.66733 5.27257 

1.51327 4.78539 1.67033 5.28205 
1.67332 5.29150 


1.51658 4.19583 

1.51987 4.80625 1.67631 5.30094 

1.52315 4.81664 1.67929 5.31037 

1.52643 4.82701 : 1.68226 S.31977 
1.68523 5.32917 


1.52971 4.83735 

1.53297 4.84768 T 1.68819 5.33854 

1.53623 4.85798 x 1.69115 5.34790 

1.53948 4.86826 x 1.69411 5.35724 

1.54272 4.87852 1.69706 5.36656 

1.54596 4.88876 1.70000 5.37587 
1.70294 5.38516 


1.54919 4.89898 

1.55242 4.90918 1.70587 5.39444 

1.55563 4.91935 1.70880 5.40370 

1.55885 4.92950 1.71172 5.41295 

1.56205 4.93964 1.71464 5.42218 
1.71756 5.43139 


1.56525 | 494975 
156844 | 4.95984 1.72037 | 5.44059 
157162 | 4.96991 1.72337 | 5.44977 
1.57480 | 4.97996 1.72627 | 545894 
i 1.72916 | 5.46809 


1.57797 4.98999 
133205 | 547723 


1.58114 5.00000 
vi0n 


va vion 


TABLE I (Continued) 


n nt Vn Vion Vn ViOn 


x 9.0000 1.73205 5.47723 
3.01 9.0601 1.73494 5.48635 
3.02 9.1204 1.73781 5.49545 
3.03 9.1809 1.74069 5.50454 
3.04 9.2416 1.74356 5.51362 


3.05 9.3025 1.74642 5.52268 
3.06 9.3636 1.74929 5.53173 
3.07 9.4249 1.75214 5.54076 
3.08 9.4864 1.75499 5.54977 
3.09 9.5481 1.75784 5.55878 


310 9.6100 1.76068 5.56776 
3.11 9.6721 1.76352 5.57674 
3.12 9.7344 1.76635 5.58570 
3.13 9.7969 1.76918 5.59464 
3.14 9.8596 1.77200 5.60357 


345 9.9225 1.77482 5.61249 
3.16 9.9856 1.77764 5.62139 
347 10.0489 1.78045 5.63028 
3.18 10.1124 1.78326 5.63915 
3.19 10.1761 1.78606 5.64801 


3.20 10.2400 1.78885 5.65685 
321 10.3041 1.79165 5.66569 
322 10.3684 1.79444 5.67450 
3.23 10.4329 1.79722 5.68331 
3.24 10.4976 1.80000 5.69210 


3.25 10.5625 1.80278 5.70088 
3.26 10.6276 1.80555 5.70964 
3.27 10.0929 1.80831 5.71839 
3.28 10.7584 1.81108 5.72713 
3.29 10.8241 1.81384 5.73585 


3.30 10.8900 1.81659 5.74456 
3.31 10.9561 1.81934 5.75326 
3.32 11.0224 1.82209 5.76194 
3.33 11.0889 1.82483 5.77062 
3.34 11.1556 1.82757 5.77927 


3.35 11.2225 1.83030 5.78792 
3.36 11.2896 1.83303 5.79655 
3.37 11.3569 1.83576 5.80517 
3.38 11.4244 1.83848 5.81378 
3.39 11.4921 1.84120 5.82237 


340 11.5600 1.84391 5.83095 
3.41 11.6201 1.84662 5.83952 
3.42 11.6964 1.84932 5.84808 
3.43 11.7649 1.85203 5.85662 
3.44 11.8336 1.85472 5.86515 


345 11.9025 1.85742 5.87367 
3.46 11.9716 1.86011 5.88218 
3.47 12.0409 1.86279 5.89067 
3.48 12.1104 1.86548 5.89915 
3.49 12.1801 1.86815 5.90762 


3.50 12.2500 1.87083 5.91608 
n? Vn ViOn 


12.2500 1.87083 5.91608 
3.51 12.3201 1.87350 5.92453 
3.52 12.3904 1.87617 5.93296 
3.53 12.4609 1.87883 5.94138 
3.54 12.5316 1.88149 5.94979 


3.55 12.6025 1.88414 5.95819 
3.56 12.6706 1.88680 5.96657 
3.57 12.7449 1.88944 5.97495 
3.58 12.8164 1.89209 5.98331 
3.59 12.8881 1.89473 5.99166 


M 12.9600 1.89737 6.00000 
3.61 13.0321 1.90000 6.00833 
3.62 13.1044 1.90263 6.01664 
3.63 13.1769 1.90526 6.02495 
3.64 13.2496 1.90788 6.03324 


13.3225 1.91050 6.04152 
3.66 13.3956 1.91311 6.04979 
3.67 13.4689 1.91572 6.05805 
3.68 13.5424 1.91833 6.06630 
3.69 13.6161 1.92094 6.07454 


3.70 13.6900 1.92354 6.08276 
3.71 13.7641 1.92614 6.09098 
3.72 13.8384 1.92873 6.09918 
3.73 13.9129 1.93132 6.10737 
3.74 13.9876 1.93391 6.11555 


14.0625 1.93649 6.12372 
3.76 14.1376 1.93907 6.13188 
3.77 14.2129 1.94165 6.14003 
3.78 14.2884 1.94422 6.14817 
3.79 14.3641 1.04679 6.15630 


: 14.4400 1.94936 6.16441 
3.81 14.5161 1.95192 | 6.17252 
3.82 14.5924 1.95448 6.18061 
3.83 | 14.6689 1.5704 | 6.18870 
3.84 | 147456 | 1.95959 | 6.19677 


3.85 14.8225 1.96214 6.20484 
3.86 14.8996 1.96469 6.21289 
3.87 14.9769 1.96723 6.22093 
3.88 | 15.0544 | 1.96977 | 622896 
3.89 15.1321 1.97231 6.23699 


15.2100 1.97484 6.24500 
3.91 15.2881 1.97737 6.25300 
3.92 15.3664 1.97990 6.26099 
3.93 15.4449 1.98242 6.26897 
3.94 | 15.5236 | 1.98494 | 6.27694 


3.95 15.6025 1.98746 | 6.28490 
3.96 15.6816 1.98997 6.29285 
3.97 15.7609 1.99249 6.30079 
3.98 | 15.8404 | 1.99499 | 6.30872 
3.99 15.9201 1.99750 6.31664 


4.00 16.0000 2.00000 6.32456 


n n Yn Vion 


332 


TABLE I (Continued) 


Vion 


16.0000 
16.0801 
16.1604 
16.2409 
16.3216 


16.4025 
16.4836 
16.5649 
16.6161 
16.7281 


16.8100 
16.8921 
16.9744 
17.0569 
17.1396 


17.2225 
17.3056 
17.3889 
17.4724 
17.5561 


17.6100 
17.7241 
17.8084 
17.8929 
17.9776 


18.0625 
18.1476 
18.2329 
18.3184 
18.4041 


18.4900 
18.5761 
18.6624 
18.7489 
18.8356 


X 6.32456 20.2500 2.12132 6.70820 
2.00250 6.33246 20.3401 2.12368 6.71565 
2.00499 6.34035 20.4304 2.12603 6.72309 
2.00749 6.34823 20.5209 2.12838 6.73053 

A 6.35610 20.6116 2.13073 6.73795 


2.01246 6.36396 20.7025 
2.01494 6.37181 20.7936 
2.01742 6.37966 20.8849 
2.01990 6.38749 20.9764 


2.02237 


6.39531 21.0681 


2.02485 6.40312 21.1600 
2.02731 6.41093 21.2521 
2.02978 6.41872 21.3444 
2.03224 6.42651 21.4369 


2.03470 
2.03715 


6.43428. 21.5296 


6.44205 


2.03961 6.44981 - 
2.04206 6.45755 21.8089 
2.04450 6.46529 21.9024 


2.04695 
2.04939 


21.9961 
22.0900 


6.47302 
6.48074 


2.05183 6.48845 22.1841 
2.05426 6.49615 22.2784 
2.05670 6.50384 22.3729 
2.05913 6.51153 22.4676 


2.06155 6.51920 22.5625 
2.06398 6.52687 22.6576 
2.06610 6.53452 22.7529 
2.06882 6.54217 22.8484 
2.07123 6.54981 22.9441 
2.07364 6.55744 23.0100 
2.07605 6.56506 23.1361 
2.07846 6.57267 232324 
2.08087 6.58027 23.3289 
2.08327 6.58787 23.4256 


2.13307 


2.14243 
2.4476 


2.15407 


2.16795 


247715 


2.20000 


6.74537 


2.13542 6.75278 
2.13776 6.76018 
2.14009 6.76757 


6.77495 
6.78233 


2.14709 6.78970 
2.14942 6.79706 
2.15174 6.80441 


6.81175 


2.15639 6.81909 
2.15870 6.82642 
2.16102 6.83374 
2.16333 6.84105 
2.16564 6.84836 


6.85565 


2.17025 6.86294 
2.17256 6.87023 
2.17486 6.87750 


6.88477 


2.17945 6.89202 
2.8174 6.89928 
2.18103 6.90652 
2.18632 6.91375 
2.18861 6.92098 
2.19089 6.92820 
2.19317 6.93542 
2.19545 6.94262 
249773 6.94982 


6.95701 


189225 | 2.08567 | 6.59545 23.5225 | 2.20227 | 6.96419 
19.0006 | 2.08806 | 6.60303 23.6196 | 2.20454 | 697137 
19.0969 | 2.09015 | 6.61060 23.7169 | 220681 | 6.97854 
19.1844 2.09284 6.61816 23.8144 2.20907 6.98570 
192721 | 2.09523 | 6.62571 239121 | 221133 | 6.99285 
19.3600 | 2.09762 | 6.63325 240100 | 2.21359 | 7.00000 
194481 | 2.10000 | 6.64078 241081 | 221585 | 7.00714 
19.5364 2.10238 6.64831 24.2064 2.21811 7.01427 
19.6249 2.10476 6.65582 24.3049 2.22036 7.02140 
197136 | 2.10713 | 6.66333 244036 | 2.22261 | 7.02851 
19.8025 2.10950 6.67083 24.5025 2.22486 7.03562 
198916 | 211187 | 6.67832 24.6016 | 222711 | 7.04273 
199809 | 211424 | 6.68581 24.7009 | 2.22935 | 7.04982 
20.0704 2.11660 6.69328 24.8004 2.23159 7.05691 
20.1601 | 2.11896 | 6.70075 24.9001 | 223383 | 7.06399 
20.2500 | 2.12132 | 6.70820 250000 | 2.23607 | 7.07107 

m Va Vion m vn Vion 


333 


334 


TABLE I (Continued) 


vi0n 


n 


Vn 


v10n 


25.0000 
25.1001 
25.2004 
25.3009 
25.4016 


25.5025 
25.6036 
25.7049 
25.8061 
25.9081 


26.0100 
26.1121 
26.2144 
26.3169 
26.4196 


26.5225 
26.6256 
26.7289 
26.8324 
26.9361 


27.0400 
27.1441 
27.2484 
27.3529 
27.4576 


27.5625 
27.6676 
27.7729 
27.8784 
27.9841 


28.0900 
28.1961 
28.3024 
28.4089 
28.5156 


28.6225 
28.7296 
28.8369 
28.9444 
29.0521 


29.1600 
29.2681 
29.3764 
29.4849 
29.5936 


29.7025 
29.8116 
29.9209 
30.0304 
30.1401 


30.2500 


2.23607 
2.23830 
2.24054 
2.24277 
2.24499 


2.24722 
2.24944 
2.25167 
2.25380 
2.25610 


2.25832 
2.26053 
2.26274 
2.26495 
2.26716 


2.26936 
2.27156 
2.27316 
2.27596 
2.27816 


2.28035 
2.28254 
2.28473 
2.28692 
2.28910 


2.29120 
2.29347 
2.29565 
2.29783 
2.30000 


2.30217 
2.30434 
2.30651 
2.30868 
2.31084 


2.31301 
2.31517 
2.31733 
2.31948 
2.32164 


2.32379 
2.32594 
2.32809 
2.33024 
2.33238 


2.33452 
2.33666 
2.33880 
2.34094 
2.34307 


2.34521 


7.07107 
7.07814 
7.08520 
7.09225 
7.09930 


7.10634 
7.11337 
7.12039 
7.12741 
7.13442 


7.14143 
7.14843 
7.15542 
7.16240 
7.16938 


7.17635 
7.18331 
7.19027 
7.19722 
7.20417 


7.21110 
7.21803 
7.22496 
7.23187 
7.23878 


7.24569 
7.25259 
7.25948 
7.26636 
7.27324 


7.28011 
7.28697 
7.29383 
7.30068 
7.30753 


7.31437 
7.32120 
7.32803 
7.33485 
7.34166 


7.34847 
7.35527 
7.36206 
7.36885 
7.37564 


7.38241 
7.38918 
7.39594 
7.40270 
7.40945 


7.41620 


30.2500 
30.3601 
30.4704 
30.5809 
30.6916 


30.8025 
30.9136 
31.0249 
31.1364 
31.2481 


31.3600 
31.4721 
31.5844 
31.6969 
31.8096 


31.9225 
32.0356 
32.1489 
32.2624 
32.3761 


32.4900 
32.6041 
32.7184 
32.8329 
32.9476 


33.0625 
33.1776 
33.2929 
33.4084 
33.5241 


33.6400 
33.7561 
33.8724 
33.9889 
34.1056 


34.2225 
34.3396 
34.4569 
34.5744 
34.6921 


34.8100 
34.9281 
35.0464 
35.1649 
35.2836 


35.4025 
35.5216 
35.6409 
35.7604 
35.8801 


36.0000 


2.34521 
2.34734 
2.34947 
2.35160 
2.35372 


2.35584 
2.35797 
2.36008 
2.36220 
2.36432 


2.36643 
2.36854 
2.37065 
2.37276 
2.37487 


2.37697 
2.37908 
2.38118 
2.38328 
2.38537 


2.38747 
2.38956 
2.39165 
2.39374 
2.39583 


2.39792 
2.40000 
2.40208 
240416 
2.40624 


2.40832 
2.41039 
2.41247 
2.41454 
2.41661 


2.41868 
2.42074 
2.42281 
2.42487 
2.42693 


2.42899 
2.43105 
2.43311 
2.43516 
2.43721 


2.43926 
2.44131 
2.44336 
2.44540 
2.44745 


2.44949 


nm 


Vn 


V10n 


n: 


Vn 


7.41620 
7.42294 
7.42967 
7.43640 
7.44312 


7.44983 
7.45654 
7.46324 
7.46994 
7.47663 


7.48331 
7.48099 
7.49667 
7.50333 
7.50999 


7.51665 
7.52330 
7.52994 
7.53658 
7.54321 


7.54983 
7.55645 
7.56307 
7.56968 
7.57628 


7.58288 
7.58947 
7.59605 
7.60263 
7.60920 


7.61577 
7.62234 
7.62889 
7.63544 
7.64199 


7.64853 
7.65506 
7.66159 
7.66812 
7.67463 


7.68115 
7.68765 
7.69415 
7.70065 
7.70714 


7.71362 
7.72010 
7.72658 
7.13305 
7.73951 


7.74597 


[I ————| 


Vion 


n? 


TABLE I (Continued) 


Vn 


Vion 


Vn 


V10n 


6.00 | 36.0000 | 2.44939 | 7.74597 
6.01 36.1201 2.45153 
6.02 | 36.2404 | 245357 
603 | 363609 | 245561 
601 | 364816 | 2.45764 
605 | 36.6025 2.45967 
606 | 36.7236 | 246171 
6.07 | 36819 | 246374 
6.08 | 369661 | 2.46577 
6.09 | 37.0881 2.46779 
610 | 37.2100 | 2.46982 7.81025 
6.11 37.3321 247184 | 7.81665 
612 | 374544 | 247386 | 7.82304 
613 | 37.5769 | 2.47588 | 7.82913 
614 | 37.6996 | 2.47790 | 7.83582 
615 | 37.8225 2.47992 7.84219 
616 | 37.9456 | 2.48193 | 7.84857 
6.17 | 38.0689 | 2.48395 | 7.85493 
618 | 38.1924 | 248596 | 7.86130 
6.19 | 38.3161 2.48797 7.86766 
620 | 38.4400 | 2.48998 | 7.87401 
621 38.5641 249199 | 7.88036 
622 | 38.6884 | 2.19399 7.88670 
623 | 38.8129 | 2.49600 | 7.89303 
624 | 38.9376 | 2.49800 | 7.89937 
6.25 | 39.0625 2.50000 | 7.90569 
626 | 391876 | 2.50200 | 7.91202 
6.27 39.3129 2.50400 | 7.91833 
628 | 39.4384 | 2.50599 | 7.92465 
6.29 | 39.5641 2.50799 | 7.93095 
630 | 39.6900 | 250998 | 7.93725 
6.31 39.8161 2.51197 7.94355 
632 | 309424 | 2.51396 | 7.94984 
6.33 | 40.0689 | 2.51595 7.95613 
6.34 | 40.1956 | 2.51794 7.96241 
6.35 | 403225 2.51992 | 7.96869 
636 | 40.4496 | 2.52190 | 7.97496 
637 | 40.5769 | 2.52389 | 7.9812 
6.38 | 40.7044 | 2.52587 7.98749 
6.39 | 40.8321 2.52784 | 7.99375 
640 | 40.9600 | 2.52982 | 8.00000 
6.41 41.0881 2.53180 8.00625 
642 | 41.2164 | 2.53377 | 8.01249 
643 41.3449 2.53574 8.01873 
6.44 41.4736 2.53772 8.02496 
6.45 41.6025 2.53969 8.03119 
646 | 417316 | 254165 | 8.03741 
647 | 41.8609 | 2.54362 | 801363 
643 | 419904 | 2.54558 | 8.04984 
649 | 42.1201 2.54755 | 8.05605 
6.50 42.2500 2.54951 8.06226 
n n? Vn vion 


n nt 
6.50 42.2500 
6.51 42.3801 
6.52 42.5104 
6.53 
6.54 
6.55 42.9025 
6.56 43.0336 
6.57 43.1619 
6.58 43.2964 
6.59 43.4281 
6.60 43.5600 
6.61 43.6921 
6.62 43.8244 
6.63 43.9569 
6.64 44.0896 
6.65 44.2225 
6.66 44.3556 
6.67 44.4889 
6.68 44.6224 
6.69 44.7561 
6.70 44.8900 
6.71 45.0241 
6.72 45.1584 
6.73 45.2929 
6.74 45.4276 


6.75 45.5625 
6.76 45.6976 
6.77 45.8329 
6.78 45.9684 
6.79 46.1041 
6.80 46.2400 
6.81 46.3761 
6.82 46.5124 
6.83 46.6489 
6.84 46.7856 
6.85 46.9225 


6.86 47.0596 
6.87 47.1969 
6.88 47.3344 


6.89 47.4721 
6.90 47.6100 
6.91 47.7481 
6.92 47.8864 


6.93 48.0249 
6.94 48.1636 


6.95 48.3025 
6.96 48.4416 
6.97 48.5809 
6.98 48.7204 
6.99 48.8601 


7.00 49.0000 


2.54951 
2.55147 
2.55343 
2.55539 
2.55734 


2.55930 
2.56125 
2.56320 
2.56515 
2.56710 


2.56905 
2.57099 
2.57294 
2.57488 
2.57682 


2.57876 
2.58070 
2.58263 
2.58457 
2.58650 


2.58844 
2.59037 
2.59230 
2.59422 
2.59615 


2.59808 


2.61343 
2.61534 


2.61725 
2.61916 
2.62107 
2.62298 
2.62488 


2.62679 
2.62869 
2.63059 
2.63249 
2.63439 


2.63629 
2.63818 
2.61008 
2.64197 
2.64386 


2.64575 


Vn 


8.06226 
8.06846 
8.07465 
8.08084 
8.08703 


8.09321 
8.09938 
8.10555 
8.11172 
8.11788 


8.12404 
8.13019 
8.13634 
8.14248 
8.14862 


8.15475 
8.16088 
8.16701 
8.17313 
8.17924 


8.18535 
8.19146 
8.19756 
8.20366 
8.20975 


8.21584 
8.22192 
8.22800 
8.23408 
8.24015 


8.24621 
8.25227 
8.25833 
8.26438 
8.27043 


8.27647 
8.28251 
8.28855 
8.29458 
8.30060 


8.30662 
8.31264 
8.31865 
8.32466 
8.33067 


8.33667 
8.34266 
8.34865 
8.35464 
8.36062 


8.36660 


Vion 


335 


TABLE I (Continued) 


Vion 


n 


Va 


Vion 


49.0000 
49.1401 
49.2804 
49.4209 
49.5616 


49.7025 
49.8436 
49.9849 
50.1264 
50.2681 


50.4100 
50.5521 
50.6944 
50.8369 
50.9796 


51.1225 
51.2656 
51.4089 
51.5524 
51.6961 


51.8400 
51.9841 
52.1284 
52.2729 
52.4176 


52.5625 
52.7076 
52.8529 
52.9984 
53.1441 


53.2900 
53.4361 
53.5824 
53.7289 
53.8756 


54.0225 
54.1696 
54.3169 
54.4644 
54.6121 


54.7600 
54.9081 
55.0564 
55.2049 
55.3536 


55.5025 
55.6516 
55.8009 
55.9504 
56.1001 


56.2500 


2.64575 
2.64764 
2.64953 
2.65141 
2.65330 


2.65518 
2.65707 
2.65895 
2.66083 
2.66271 


2.66458 
2.66646 
2.66833 
2.67021 
2.67208 


2.67395 
2.67582 
2.67769 
2.67955 
2.68142 


2.68328 
2.68514 
2.68701 
2.68887 
2.69072 


2.69258 
2.60444 
2.69629 
2.69815 
2.70000 


2.70185 
2.70370 
2.70555 
2.70740 
2.70924 


2.71109 
2.71293 
2.71477 
2.71662 
2.71846 


2.72029 
2.72213 
2.72397 
2.72580 
2.72764 


2.72947 
2.73130 
2.73313 
2.73496 
2.73679 


2.73861 


Vn 


8.36660 
8.37257 
8.37854 
8.38451 
8.39047 


8.39643 
8.40238 
8.40833 
8.41427 
9.42021 


8.42615 
8.43208 
8.43801 
8.44393 
8.44985 


8.45577 
8.46168 
8.46759 
8.47349 
8.47939 


8.48528 
8.49117 
8.49706 
8.50294 
8.50882 


8.51469 
8.52056 
8.52613 
8.53229 
8.53815 


8.54400 
8.54985 
8.55570 
8.56154 
8.56738 


8.57321 
8.57904 
8.58487 
8.59069 
8.59651 


8.60233 
8.60814 
8.61394 
8.61974 
8.62554 


8.63134 
8.63713 
8.64292 
8.64870 
8.65448 


8.66025 


56.2500 
56.4001 
56.5504 
56.7009 
56.8516 


57.0025 
57.1536 
57.3049 
57.4564 
57.6081 


57.7600 
57.9121 
58.0644 
58.2169 
58.3696 


58.5225 
58.6756 
58.8289 
58.9824 
59.1361 


59.2900 
59.4441 
59.5984 
59.7529 
59.9076 


60.0625 
60.2176 
60.3729 
60.5284 
60.6841 


60.8400 
60.9961 
61.1524 
61.3089 
61.4656 


61.6225 
61.7796 
61.9369 
62.0944 
62.2521 


62.4100 
62.5681 
62.7264 
62.8849 
63.0436 


63.2025 
93.3616 
63.5209 
63.6804 
63.8401 


64.0000 


2.73861 
2.74044 
2.74226 
2.74408 
2.74591 


2.74773 
2.74955 
2.75136 
2.75318 
2.75500 


2.75681 
2.75862 
2.76043 
2.76225 
2.76405 


2.76586 
2.76767 
2.76948 
2.77128 
2.77308 


2.77489 
2.77669 
2.77849 
2.78029 
2.78209 


2.78388 
2.78568 
2.78747 
2.78927 
2.79106 


2.79285 
2.79464 
2.79643 
2.79821 
2.80000 


2.80179 
2.80357 
2.80535 
2.80713 
2.80891 


2.81069 
2.81247 
2.81425 
2.81603 
2.81780 


2.81957 
2.82135 
2.82312 
2.82489 
2.82666 


2.82843 
Va 


8.66025 
8.66603 
8.67179 
8.67756 
8.68332 


8.68907 
8.69483 
8.70057 
8.70632 
8.71206 


8.71780 
8.72353 
8.72026 
8.73499 
8.74071 


8.74643 
8.75214 
8.75785 
8.76356 
8.76926 


8.77496 
8.78066 
8.78635 
8.79204 
8.79773 


8.80341 
8.80909 
8.81476 
8.82043 
8.82610 


8.83176 
8.83742 
8.84308 
8.84873 
8.85438 


8.86002 
8.86566 
8.87130 
8.87694 
8.88257 


8.88819 
8.89382 
8.89944 
8.90505 
8.91067 


8.91628 
8.92188 
8.92749 
8.93308 
8.93868 


8.94427 


n: 


TABLE I (Continued) 


Vn Vion Va 


Vion 


64.0000 
64.1601 
64.3204 
64.4809 
64.6416 


64.2025 
64.9636 
65.1249 
65.2864 
65.4481 


65.6100 
65.7721 
65.9344 
66.0969 
66.2596 


66.4225 
66.5856 
66.7489 
66.9124 
67.0761 


67.2400 
67.4041 
67.5684 
67.7329 
67.8976 


68.0625 
68.2276 
68.3929 
68.5584 
68.7241 


68.8900 
69.0561 
69.2224 
69.3889 
69.5556 


69.7225 
69.8896 
70.0569 
70.2244 
70.3921 


70.5600 
70.7281 
70.8964 
71.0649 
71.2336 


71.4025 
71.5716 
71.7409 
71.9104 
72.0801 


72.2500 


nt 


72.2500 2.91548 


2.82843 8.94427 


2.83019 8.94986 72.4201 2.91719 
2.83196 8.95545 72.5904 2.91890 
2.83373 8.96103 72.7609 2.92062 
2.83549 8.96660 72.9316 2.92233 
2.83725 8.97218 73.1025 2.92404 
2.83901 8.97775 73.2736 2.92575 
2.841077 8.98332 73.4449 2.92746 
2.84253 8.98888 73.6164 2.92916 


2.84429 8.99444 73.7881 2.93087 


9.21954 
9.22497 
9.23038 
9.23580 
9.24121 


9.24662 
9.25203 
9.25743 
9.26283 
9.26823 


2.84605 9.00000 73.9600 2.93258 9.27362 
2.84781 9.00555 74.1321 2.93428. 9.27901 
2.84956 9.01110 74.3044 2.93598 9.28440 
2.85132 9.01665 74.4769 2.93769 9.28978 
2.85307 9.02219 74.6496 2.93939 9.29516 
2.85482 9.02774 74.8225 2.94109 | 9.30054 
2.85657 9.03327 74.9956 2.94279 | 9.30591 
2.85832 9.03881 75.1689 2.94449 9.31128 
2.86007 9.04434 75.3424 2.94618 9.31665 
2.86182 9.04986 75.5161 2.94788 | 9.32202 
2.86356 9.05539 75.6900 2.94958 9.32738 
2.86531 9.06091 75.8641 2.95127 9.33274 
2.86705 9.06642 76.0384 2.95296 9.33809 
2.86880 9.07193 76.2129 2.95466 9.34345 
2.87054 9.07744 76.3876 2.95635 9.34880 
2.87228 9.08295 76.5625 2.95804 9.35414 
2.87402 9.08845 76.7376 2.95973 9.35949 
2.87576 9.09395 76.9129 2.96142 9.36483 
2.87750 9.09945 77.0884 2.96311 9.37017 
2.87924 9.10494 77.2641 2.96479 9.37550 
2.88097 9.11043 77.4400 2.96648 9.38083 
2.88271 9.11592 77.6161 2.96816 9.38616 
2.88444 9.12140 77.7924 2.96085 9.39149 
2.88617 9.12688 71.9689 2.97153 9.39681 
2.88791 9.13236 78.1456 2.97321 9.40213 
2.88964 9.13783 78.3225 2.97489 9.40744 
289137 | 9.14330 784996 | 2.97658 | 9.41276 
2.89310 9.14877 78.6769 2.97825 9.41807 
2.89482 9.15423 788544 2.97993 9.42338 
2.89655 9.15969 79.0321 2.98161 9.42868 
2.89828 9.16515 79.2100 2.98329 9.43398 
2.90000 9.17061 79.3881 2.98496 9.13928 
2.90172 9.17606 19.5661 2.98664 9.44458 
2.90345 9.18150 79.7449 2.98831 9.44987 
2.90517 9.18695 79.9236 2.98998 9.45516 
2.90689 | 9.19239 80.1025 | 2.99166 | 9.46044 
2.90861 9.19783 80.2816 2.99333 9.46573 
2.91033 9.20326 80.4609 2.99500 9.47101 
2.91204 | 9.20869 80.6404 2.99666 9.47629 
291376 | 921412 80.8201 | 2.99833 | 9.48156 
291548 | 9.21954 81.0000 | 3.00000 | 9.48683 
| == — 
Vn vion nm Vion 


337 


338 


81.0000 
81.1801 
81.3604 
81.5409 
81.7216 


81.9025 
82.0836 
82.2649 
82.4464 
82.6281 


82.8100 
82.9921 
83.1744 
83.3569 
83.5396 


83.7225 
83.9056 
84.0889 
84.2724 
84.4561 


84.6400 
84.8241 
85.0084 
85.1929 
85.3776 


85.5625 
85.7476 
85.9329 
86.1184 
86.3041 


86.4900 
86.6761 
86.8624 
87.0489 
87.2356 


87.4225 
87.6096 
87.7969 
87.9844 
88.1721 


88.3600 
88.5481 
88.7364 
88.9249 
89.1136 


89.3025 
89.4916 
89.6809 
89.8704 
90.0601 


90.2500 


TABLE I 


3.00000 
3.00167 
3.00333 
3.00500 
3.00666 


3.00832 
3.00998 
3.01164 
3.01330 
3.01496 


3.01662 
3.01828 
3.01993 
3.02159 
3.02324 


3.02490 
3.02655 
3.02820 
3.02985 
3.03150 


3.03315 
3.03480 
3.03645 
3.03809 
3.03974 


3.04138 
3.04302 
3.04467 
3.04631 
3.04795 


3.04959 
3.05123 
3.05287 
3.05450 
3.05614 


3.05778 
3.05941 
3.06105 
3.06268 
3.06431 


3.06594 
3.06757 
3.06920 
3.07083 
3.07246 


3.07409 
3.07571 
3.07734 
3.07896 
3.08058 


3.08221 
Va 


9.48683 
9.49210 
9.49737 
9.50263 
9.50789 


9.51315 
9.51840 
9.52365 
9.52890 
9.53415 


9.53939 
9.54463 
9.54987 
9.55510 
9.56033 


9.56556 
9.57079 
9.57601 
9.58123 
9.58645 


9.59166 
9.59687 
9.60208 
9.60729 
9.61249 


9.61769 
9.62289 
9.62808 
9.63328 
9.63846 


9.64365 
9.61883 
9.65401 
9.65919 
9.66437 


9.66954 
9.67471 
9.67988 
9.68504 
9.69020 


9.69536 
9.70052 
9.70567 
9.71082 
9.71597 


9.72111 
9.72625 
9.73139 
9.73653 
9.74166 


9.74679 


(Continued) 


90.2500 
90.4401 
90.6304 
90.8209 
91.0116 


91.2025 
91.3936 
91.5849 
91.7764 
91.9681 


92.1600 
92.3521 
92.5444 
92.7369 
92.9296 


93.1225 
93.3156 
93.5089 
93.7024 
93.8961 


94.0900 
94.2841 
94.4784 
94.6729 
94.8676 


95.0625 
95.2576 
95.4529 
95.6484 
95.8441 


96.0400 
96.2361 
96.4324 
96.6289 
96.8256 


97.0225 
97.2196 
97.4169 
97.6144 
97.8121 


98.0100 
98.2081 
98.4064 
98.6049 
98.8036 


99.0025 
99.2016 
99.4009 
99.6004 
99.8001 


100.000 


vi0n 


Va 


3.08221 
3.08383 
3.08545 
3.08707 
3.08869 


3.09031 
3.09192 
3.09354 
3.09516 
3.09677 


3.09839 
3.10000 
3.10161 
3.10322 
3.10483 


3.10644 
3.10805 
3.10966 
3.11127 
3.11288 


3.11448 
3.11609 
3.11769 
3.11929 
3.12090 


3.12750 
3.12410 
3.12570 
3.12730 
3.12890 


3.13050 
3.13209 
3.13369 
3.13528 
3.13688 


3.13847 
3.14006 
3.14166 
3.14325 
3.14484 


3.14643 
3.14802 
3.14960 
3.15119 
3.15278 


3.15436 
3.15595 
3.15753 
3.15911 
3.16070 


3.16228 


9.74619 
9.75192 
9.15105 
9.76217 
9.16729 


9.77241 
9.71153 
9.78264 
9,18775 
9.79285 


9.79796 
9.80306 
9.80816 
9.81326 
9.81835 


9.82344 
9.82853 
9.83362 
9.83870 
9.84378 


9.81886 
9.85393 
9.85901 
9.86108 
9.86914 


9.87421 


9.91968 


9.92472 
9.92915 
9.93479 
9.93982 
9.94485 


9.94987 
9.95490 
9.95992 
9.96494 
9.96995 


9.91497 
9.97998 
9.98499 
9.98999 
9.99500 


10.0000 


TABLE J 
SQUARES, SQUARE ROOTS, RECIPROCALS: 1-99 


N N: VN YN |N N: VN YN|N ON VN 1/N 
1 1 1.000 1.0000 | 34 1,156 5.831 .0294 | 67 4,489 8.185 .0149 
2 4 l.414 .5000 | 35 1,225 5.916 .0286 | 68 4,624 8.246 .0147 
3 9 1.732 3333 | 36 1,296 6.000 .0278 | 69 4,761 8.307 .0145 
4 16 2.000 .2500 | 37 1,369 6.083 .0270 | 70 4,900 8.367 .0143 
5 295 2.936 .2000 | 38 1,444 6.164 .0263 | 71 5,041 8.426 .0141 
6 36 2.449 .1667 | 39 1,521 6.245 .0256 | 72 5,184 8.485 .0139 
7 49 2.646 .1429 | 40 1,600 6.325 .0250 | 73 5,329 8.544 .0137 
8 64 2.828 .1250 | 41 1,681 6.403 .0244 | 74 5,476 8.602 .0135 
9 81 3.000 .1111 | 42 1,764 6.481 .0238 75 5,625 8.660 .0133 
10 100 3.162 .1000 | 43 1,849 6.557 .0233 76 5,776 8.718 .0132 
11 121 3.317 0909 | 44 1,936 6.633 .0227 | 77 5,929 8.775 .0130 
12 144 3.464 .0833 | 45 2,025 6. 78 6,084 8.832 .0128 
13 169 3.606 .0769 | 46 2,116 6. 79 6,241 8.888 .0127 
14 196 3.742 .0714 | 47 2,209 6. 80 6,100 8.944 .0125 
15 225 3.873 .0667 | 48 2,304 6. 81 6,561 9.000 .0123 
16 256 4.000 .0625 | 49 2,101 7.000 .0204 | 82 6,724 9.055 .0122 
17 289 4.123 .0588 | 50 2,500 7.071 0200 | 83 6,889 9.110 .0120 
18 324 4.243 .0556 | 51 2,601 7.141 0196 | 84 7,056 9.165 .0119 
19 361 4.359 .0526 | 52 2,704 7.211 0192 | 85 7,225 9.220 .0118 
20 400 4.472 .0500 | 53 2,809 7.280 0189 | 86 7,396 9.274 .0116 
21 441 4.583  .0476 | 54 2,916 7.348 0185 | 87 7,569 9.327 .0115 
22 484 4.690 .0455 | 55 3,025 7.416 .0182 | 88 7,744 9.381 .0114 
23 529 4.796 .0435 | 56 3,136 7.483 “0179 | 89 7,921 9.434 .0112 
24 576 4.899 .0417 | 57 3,249 7.550 “0175 | 90 8,100 9.487 .0111 
25 625 5.000 .0400 | 58 3,361 1.616 .0172 | 91 8,281 9.539 .0110 
26 616 5.099 .0385 | 59 3,481 1.681 .0169 | 92 8,464 9.592 .0109 
27 729 5.196 .0370 | 60 3,600 1.146 .0167 | 93 8,649 9.644 .0108 
28 784 5.292 .0357 | 61 3,721 7.810 .0164 | 94 8,836 9.695 .0106 
29 841 5.385 .0345 | 62 3,814 1.874 .0161 | 95 9,025 9.747 .0105 
30 900 5.477 .0333 | 63 3,969 1,937 .0159 | 96 9,216 9.798 .0104 
31 961 5.568 .0323 | 64 4,096 8.000 .0156 | 97 9,409 9.849 .0103 
32 1,024 5.567 .0312 | 65 4,295 8.062 .0154 | 98 9,604 9.899 .0102 

5 66 4,356 8.124 .0152 | 99 9,801 9.950 .0101 


33 1,089 5.745  .0303 š 


339 


ANSWERS TO SELECTED EXERCISES 


Chapter 1 


3. 


nM 


11. 


13. 


Any larger body of similar facts or materials of which the sample may 
be considered representative. For example, the students in a given psy- 
chology class might be a sample of all psychology students in the college. 
Ideally, the research worker begins with a well-defined population and 
randomly draws a sample from it. Practically, in the behavioral sciences, 
he may have to start with an available group (sample) and later have to 
decide to what population his inferences extend. See p. 11. 


. Yes. As sample size increases in random sampling, population charac- 


tetistics are more and more accurately portrayed by the sample. Chance 
is even-handed; if there are more scores in, say, the 35-44 class than in 
other classes in the population, continued sampling will bring it out. 


. If one had to choose between a small random sample and a large non- 


random sample, the small sample would ordinarily be preferred. Al- 
though it may provide little useful information about the population, it 
is less likely to yield misinformation than a large nonrandom sample. 


. Blood is more homogeneous than mental age. 
. No. The first statement implies an ordinarily impossible task. 
- The students in the class are the sample; the instructor's past, present, 


and future students are the population. 


- (a) The donations are so variable that no average provides a fair or 


meaningful summary. (b) The arithmetic means are 17 and 19, respec- 
tively, but a comparison of means only obscures the fact that 3 of 5 
workers did very well under Method II, 2 workers very badly. (c) The 
sample consists of the 31 who chose to return the questionnaire. Whether 
the other 19 would respond similarly, the researcher cannot say. He 
can conclude that a simple majority of the original sample, 26 of 50, 
favor the proposal, but this is all. (d) Those who continue in school at 
any given level and those who do not are, as groups, different in several 
important ways, such as socioeconomic status and ability. These dif- 
ferences, rather than years of schooling, could be the explanation of 
differences in income. (e) In this self-selected sample, one would expect 
to find students who were especially interested in social problems; 
hence, a sample not fairly representative of attitudes and changes in 
attitudes of other students. 

Socioeconomic Rating is an attribute or qualitative variable; A, B, and 
C are categories. VAT and Class Rank are quantitative variables; the 
data under VAT form a continuous series; the data under Class Rank 
form a discrete series. 

In the sense that, under more precise measurement, an individual's score 


might fall anywhere within the range of the ability tested, not just on a 
whole number. 


Answers to Selected Exercises / 341 


14. (a) Continuous. (b) Discrete. (c) Discrete. (d) Continuous. (e) Con- 
tinuous. 

15. (a) Enrollments in liberal-arts colleges during a given year. (b) Enroll- 
ments in different kinds of colleges or in different departments of same 
college. 

16. (a) Discrete quantitative. (b) Continuous quantitative. (c) Qualitative. 
(d) Qualitative. (e) Discrete quantitative. (f) Qualitative. 


Chapter 2 


. Data in which a few values are extreme with respect to the great majority. 
. The range of chronological ages in months is 203 to 290. The 146 ages 
might be grouped in intervals of 6, with top class 288-293. The range of 
Regents’ language is 65 to 99. The 74 scores might be grouped in inter- 
vals of 3, with top class 98-100. The range of Regents' averages is 74.4 
to 97.0. The 76 averages might be grouped in intervals of 2.0, with top 
class 96.5-98.4. The limits of the classes would be expressed to tenths, 
so that the real limits would be to hundredths. Thus, the real limits of 
the class 96.5-98.4 would be 96.45-98.45. Of course, other grouping 


wr 


schemes could be defended. 
4. MIDPOINT CONSECUTIVE SCORES REAL LIMITS EXPRESSED LIMITS 
88, 89 87.5-89.5 88-89 
65, 66, 67 65-67 
50 48, 49, 50, 51, 52 47.5-52.5 
74,75, .. ., 79, 80 73.5-80.5 74-80 
94.5 90, 91, .. ., 98, 99 89.5-99.5 
3.5, 3.6, 3.7, 3.8, 3.9 3.45-3.95 3.5-3.9 
2.215 2.20, 2.21, 2.22, 2.23 2.195-2.235 
d to be positive; in others, 


the sum of errors ten 
each class fell at some point above 


ould not be compensating. This 
ultiples of 5 and the lower limits 
ily the errors are decreased. 


9. (a) In some classes, 
negative. (b) If all of the scores in 
(or below) the midpoint, the errors wi 
would be the case if the scores were m 
of the classes also multiples of 5. (c) Ordinar 

10. Positively skewed. 
11. Negatively skewed. Positive 
12. Positively skewed. 


ly skewed. 


Chapter 3 


compare data. 
eless subjects who may or may not be able to 


low, careful subjects who cannot solve the 


1. To reduce, describe, and 

2. The data suggest fast, car 
solve the problem, and s 
problem. 

4, A, 32.7; B, 28. 
I, 29.9; J, 29.1; All Schools, 


6; C, 26.5; D, 28.8; E, 38.3; F, 37.9; G, 25.5; H, 29.2; 


30.51. 


342 / Statistics in Education and Psychology 


7. $6,566.17. With the top 3 salaries excluded, the median is $6,516.17, a 
change of $50.00. 

8. A, 31.7; B, 27.3; C, 26.7; D, 29.5; E, 38.0; F, 36.4; G, 24.1; H, 28.4; 
I, 29.9; J, 27.4; All Schools, 30.13. 

9. $6,651.29. With the top three salaries excluded, the mean is $6,485.35, a 
change of $165.94. 

14. Random and relatively small errors in the readings. 

16. A might be given a value of 4; B of 3; C of 2; and D of 1. Any average 
obscures the student's strength in mathematics and science and weak- 
ness in English and language. 

17. In a unimodal distribution, with negative skewness, the mean is smallest, 
the mode largest, and the median is between the 
ness, the order is reversed. 
coincide. 


two; for positive skew- 
In a symmetrical distribution, the three 


20. (a) Separate averages for men and women would be more informative. 


Also, the median would be better than the mean, because such age dis- 
tributions are markedly skewed. (b) Data too variable to have a mean- 
ingful average. (c) When the mean and the median are very different, 
the median is the fairer average. (d) Extreme temperatures are con- 
cealed by an average. (e) The correct percentage is 72.5. Unless the 


groups are equal in size, percentages cannot be simply averaged. See 
formula (3.6). 


Chapter 4 


1. » and to supplement averages in 
uses for the measures. 


3. A, 3.01; B, 2.14; C, 5.94; D, 4.42; E, 4.42; F, 4.64; G, 7.88; H, 4.14; 


5. Symmetrical distributions. 


ns are each equal to 4.2, but Q of the į dis- 


t l flects less variation in judges' ratings, 
1.€., more agreement, it is preferred. 


refers to the average of the deviations from the mean. 
9. The AD's are 2.2 and 4,8, respectively; the S D's are 2.8 and 8.2. 

12. Adding or subtrai i ' 9 effect on the standard deviation; 
multiplies or divides the standard 


14. Mr = 592.5; sy = 828. 


15. (a) 18.63 and 7.53. (b) 68%, 94%, 100%. 
19. ma = 7.07, m3 = 4.70,m4 = 


7.0 123.01, so that g; = 25 and g2 = 2.46. 
The distribution is positively sk 


ewed and platykurtic. 


Answers to Selected Exercises / 343 


Chapter 5 


A quartile is a point, not a range or portion. The individual is in the 
lowest quarter or below the first quartile. 


4. 95, 85, 75, 65, 55, 45, 35, 25, 15, 5, respectively. 

5. Pos = 28.91; P75 = 41.11. 

7. (a) 34, (b) 18, (c) 85, (d) 80. 

9. (a) 8, 10, (b) 15, 4. 

10. Yes. In a normal distribution a standard score of .5 has a percentile 
rank of 69. 

11. (a) The z scores, from highest to lowest, are 2.1, 1.7, 1.3, .9, .5, 0, —.4, 
—,8, —1.2, —1:6; —2:0, —2.5, and —2.9. The coresponding Z scores 
are 71, 67, 63, 59, 55, 50, 46, 42, 38, 34, 30, 25, and 21. The Z' scores 
are 10 times the Z scores. (b) According to Table C, the percentile ranks 
corresponding to the z scores are 98.2, 95.5, 90.3, 81.6, 69.2, 50, 34.5, 
21.2, 11.5, 5.5, 2.3, .6, and .2. These differ from the actual percentile 
ranks shown because the distribution is not normal. (d).8 and .4; 5.2 
and .4; 18.6 and .4; 6.7 and .4. 

16. The standard score in weight is 3.0, as compared to 1.6 in height. 

17. 71.75 in. and 155 Ibs. 

18. 61.5. 

Chapter 6 

2. Left column, reading down: .5000, .5000, .9544, .9974, .4951, .4495 
Right column, reading down: .0250, .0250, .9500, .4100, .5900, .3000. 

3. (a) —.32 and +.32. (b) —1.28 and +1.28. (c) —1.96 and 4-1.96. 
(d) —2.58 and 4-2.58. 

4. (a) 477. (b) 79. (c) 66.6 — 83.4; 66.6, 83.4, and 8.4. (d) 18.5. (e) 
— 18.5. (f) 341 in 500 or 68 in 100; 11.4 in 500 or 2.3 in 100; 11.4 in 500 
or 2.3 in 100. 

5. (a) 78. (b) 33. . 

6. The theoretical normal frequency above 87.5 is 1.68; in the 83.5-87.5 
class, 2.28; then, in successive classes, 4.31, 6.82, 9.12, 10.34, 9.93, 8.08, 
5.57, 3.26 and 2.62. 

7. (a) 4% would receive 1; 7%, 2; 12%, 3; 17%, 4; 20%, 5; 17%, 6; 
1295, 7; 795, 8; and 4%, 9. (b) That the ability tested is distributed 
normally. (c) When the assumption is untenable. 

8. (a) Pio = 23.22, Poo = 45.88. (b PR(28) = 23, PR(38) = 65. 
(c) Pio = 22.24, Poo = 46.20, PROS) = 22, PR(38) = 63. (d) In the 
construction of norms, when it can be assumed that the population is 
normal. 

9. (b) The z values corresponding to the ratings A, B, C, D, E, of the first 

13, —.84, and — 1.64, respectively; of the 


psychologist are 1.64, 1.04, 


344 / Statistics in Education and Psychology 


second, .84, 0, —.52, — 1.04, and — 1.64; of the third, 1.64, 1.04, .39, 
—.25, and — 1.04. Hence, the first child's combined rating is .13 + .84 + 
1.04 or 2.01; that of the second, .13 + 0 + 1.64 or 1.77. 


10. (a) M = 1243, s = 1.77. The Z scores, reading from the top, are 65, 
59, 53, 48, 42, 36, 31, 25, and 19. (b) The T scores, reading from the top, 
are 67, 59, 52, 46, 40, 37, 35, 32, and 27. (c) The distribution is not 
normal. 

11. RANK: 1 2 3 4 5 6 7 8 9101112 13 14 15 16 17 18 19 20. 
Z SCORE: 70 64 62 59 58 56 55 53 52 51 49 48 47 45 44 42 4| 38 36 30. 

12. .159, .159, .682, .023, .023, .954. 

Chapter 7 

3. One procedure would to be to give a "conservative-liberal" attitude 
scale to a group of varying ages and correlate attitude scores with ages. 

6. (a) Positive. (b) Probably negative. (c) Positive. (d) Positive but weak. 
(e) Probably negative. (f) Positive. 

9. rz, = Zz,z,/N; hence, since it is given that the paired standard scores 
are equal, rzy = 2z?/N. But Zz2/N = Ex2/Ns2 = 1, since Zx2/N = s?, 

12. rey = 64. 

18. rey = 48. 

19. No. The relationship is not linear, 

22. (a) 78.0 (b) 29. (c) 71 in 100. (d) 72.7 — 833. 

23. (a) See p. 152. (b) 74. (c) See p. 152. (d) 8. (e) 96 in 100. 

24. 62%, 92%, sy. = 6.07. 

28. 36%, 64%, .71. 

29. The regression equation is Y' = 40x — 36.4, 

30. .42. 

31. The standard scores corresponding to 39.4 and 40,6 are 1.19 and .84, 
respectively. The mean of the five highest wives? Scores is 44.4, and 
the mean of the paired husbands’ scores is 36.4. The corresponding 
standard scores are 1.30 and -81. Thus, in both cases, when extreme 
Scores in one variable are selected, their paired Scores are, as a group, 
less extreme. 

34. rq = 57. 

35. ra = .93. Because the relationship is not linear, 

36. .14, .09, and .63, respectively. 

37. rp = 43. 

38. ri2.3 = .19. The relationship between Semester averages and hours of 
study, independent of aptitude, is positive, 

Chapter 8 

I. 


Sampling error is illustrated by the fact 


that other samples of 15,000, 
whether or not all questionnaires were re 


turned, would show some dis- 


Answers to Selected Exercises / 345 


agreement in average value. Bias may be inferred from the relatively 
large average. It is reasonable to suppose that those with heavier loads 
returned questionnaires. Measurement error is illustrated by inac- 
curacies in estimates of working hours in week by the respondents. 
Note that sampling and measurement errors are unavoidable. 

4. By correlating mathematics and statistics scores. By correlating half- 
test scores and applying formula (8.1) or by Kuder-Richardson methods. 

6. That the half-tests are equivalent. .75. 

T, i86; 

8. 13.5 hrs. 

13. Find the standard deviation of the differences between half-test scores. 

15. (a) 317, 46. (b) 84 in 100, 84 in 100. (c) That the errors of measurement 
are homoscedastic and normally distributed. (d).83. 

16. .64. That the standard error of measurement is the same in both groups. 

17. .91. 

18. Qualifications needed include: (a) Random errors and relatively large 
groups. (b) Homoscedasticity of errors over entire range. (c) In dis- 
tinguishing between individuals. (d) Assumptions underlying reliability 
estimation satisfied for both tests. 

20. For items 6, 11, and 13, the method of upper-lower halves yields 0, .12, 
and .44. Flanagan's method yields .11, .20, and .80. The point-biserial 
coefficients are .14, .09, and .63. 

21. For items 6, 11, and 13, the method of upper-lower halves yields 0, de, 
and .19. Flanagan's method yields —.23, .21, and .45. The point- 
biserial coefficients are —.12, .23, and .38. 

26. HINT: Show that S and S’ differ by a constant. 


Chapter 9 


2. A sampling problem exists when one generalizes findings to cases not 
included in the study. The population of concern is present and future 
students, The social worker can logically make inferences about this 
population if the dropouts interviewed are representative. 

3. The population is foreign-born, twelve-year olds in the city; the sample, 
the 100 tested; the sampling distribution, the distribution of means of 
all possible random samples of 100 from the population. 

5. Testing hypotheses and determining confidence intervals. A confidence 
interval separates admissible from inadmissible hypotheses. 

6. Measure all members of the population. 

8. *Equally likely" is itself a probability concept. 


9. .035, .035. 
10. .68. 
11. .16;..16, 68. 


12. “Level of significance” is associated with the Type I error; “power of 
the test" with the Type II error. 


346 / Statistics in Education and Psychology 


Chapter 10 


ls 


- (a) The hypothesis to be tested is Ho:ó 


- Turning to Table E, we find that an 


- (a) Suy, = .792 and z = 


. (a) p1 = 38/52 or 43, p2 = 


- The 99% confidence interval for the 


Su = 79/4137 = 6.75, z = (552 — 540), 6.75 = 1.78. The cor- 
responding probability, P, on a two-sided test, is 2(.0375) or .075. The 
hypothesis cannot be rejected at the 5% level. : At this level, there is 
insufficient evidence to conclude that the population mean is not 540. 


. Since sy — 6.75, the 95%, confidence interval for the population mean 


is 552 += 1.96(6.75) or 538.8 — 565.2. Any hypothesis proposing a 
population mean of 538.8 or less or 565.2 or more can be rejected at the 
5% level. Any hypothesis proposing a mean between 538.8 and 365.2 can- 
not be rejected at the 597 level. In addition to separating admissible 
from inadmissible hypotheses, at a given level, confidence intervals also 
indicate the precision of the estimate. In the present case, the width of 
the interval, 26.4, is 4.8% of the sample mean. . 

— .50, against the alternative 
hypothesis H4:¢ > .50. The one-sided alternative is appropriate be- 
cause we wish to know how sure we can be that $ is greater than .50. 
Using formula (10.5), we get z — 2.10. The corresponding P, for a 
one-sided test, is .018. If the null hypothesis is true that $ = .50, we 
would obtain a sample value of .61 or more less than 2% of the time. 
We can be fairly sure that ¢ is greater than .50, or that a simple major- 


ity favor X. (b) Applying formulas (10.6), we obtain .51 — .71 as 
the 95% confidence interval for $. 


rz, Of .54 has a z, value of .60. 
25, and z = (60 — 0)/.25 — 
O-sided test, is 2(.0082) or .016. 
at the 1% level. (b) Since S: = .25 


nce interval for the population z,is.60 = 
2.58(.25) or —.045 — 1.245. The corresponding interval for pry is 


—.045 — .845. (c) The risk of the Type II error is large because the 
confidence interval is very wide. (See p. 260.) 


2.69. The corresponding P is 2(.0036) or 
-007. The difference between means is significant at the 1% level. (b) 


The 99% confidence interval is 243 + 2.58(.792) or .09 — 4.17. 
11/37 or .30, p = 55,4 = 45,2 = 4.02. 
The corresponding P, for a two-sided test, is less than .00006. The dif- 


ference is significant at the 1% level. Applying the correction for con- 
tinuity, p. 259, we get z = 3.80 andP = 


Since N = 19, 54, = 1/V19-3 = 
2.40. The corresponding P, for a tw 
The hypothesis cannot be rejected 
and z, = .60, the 99% confide: 


whether the relationship is significant ; Y 


does not do this. Usually, we 
are interested mainly in whether the relati 


onship is significant. 


difference between population 
proportions, ó1 — $», is .18 — .68. 


10. 


14. 


Answers to Selected Exercises / 347 


When the samples are combined and the scores ranked, the sum of ranks 
in one sample is 169, in the other 296. By formula (10.16), z = 2.61, 
and P — 2(.0045) or .009. The hypothesis can be rejected at the 5% 
level; in fact, the result is significant at the 1% level. The Wilcoxon T 
test is essentially a test for differences between averages, and we may 
conclude that the populations differ in average value. 


. (a) When every member of the population is measured, there is no 


sampling error. (b) The sample is too small for normal sampling dis- 
tribution procedures. (c) The samples are not independent. Formula 
(10.20) is appropriate. (d) The confidence interval is wide, and the risk 
of accepting a false null hypothesis is relatively great. (e) Small dif- 
ferences may be highly significant when samples are large. 

Subtracting the item 9 scores from the item 8 scores, we get 14 plus signs 
and 1 minus sign, 15 signs in all. Using formula (10.19), we get z = 3.10. 
The difference is significant at the 59% level; in fact, it is significant at 
the 0.2% level. We can be highly confident that further samples from 
the population will find items 8 and 9 unequal in difficulty. 


. HINT: If the original data were available and subtractions were made 


within pairs, there would be 20 signs of one kind and 5 signs of the 
other kind, 25 signs in all. 


. When the differences in pairs of scores are found and their absolute 


values ranked, the sum of ranks of the three negative differences is 143. 
By formula (10.20), z = 1.88. Since the corresponding P is .06, the 


difference is not significant at the 5% level. 


Chapter 11 


pop 


- (a) .01, (b) .01, (c) .02, (d) .15, (e) .10, (f) .05. 


For n = 2, the í values are —2.92, —4.30 and 4.30, and 2.92. 

When 7 is infinitely large, the £ distribution is normal. . 

The hypothesis to be tested is Ho :u = 50, against the alternative hy- 
pothesis H4:u = 50. The mean of the sample is 53. Using formula 
(11.1) or (11.2), we get £ = .67, with n = 9. The corresponding P, on 
two-sided test is greater than .50, and we conclude that the class is not 


an unusual one. 
4,and.01 > P > .001. (b) No. 1 = 7.09, 


(a) No. 1 = 5.49, with n = 
n = S, and P < .001. (c) No. t = 1.96, n = 9, and .10 > P > 205. 
We conclude that the difference is not significant at the 5% level, since 


P > 05. 
(a) 105.9 — 118.1, (b) 104.8 — 110.2, (c) —.7 — 9.7. f 
Superior, in that it is applicable to samples of any size and is exact. 


Logically restricted to samples from equally variable populations. 


348 / Statistics in Education and Psychology 


10. No. For the rg of Table 7.5, 1 = .77, n = 9, P > .40. For the Pop Of 
Table 7.6, 1 = .82, n = 10,P > .40. 

11. .88, .63, .51. 

12. Yes. t = 2.61, n = 83,.02 > P > .01, 


Chapter 12 

1. X? = 16.2, n = 4, P < .005. The hypothesis of no relationship is 
rejected. : - 

2.X? = 35,2 n = 2, P « 005. The hypothesis of no relationship is 
rejected. 

3. In Exercise 3, when Yates” correction is applied, x? — 441, n = 1. 
In Exercise 8, when Yates' correction is applied, X? = 14.7, n = 1. 


The z test is generally preferred because it gives definite 
because it permits interval estimation. 

4. Using the theoretical normal frequencies of Exercise 6, Chapter VI, and 
combining the top two and bottom two classes, we obtain X? = 9,80, 
n — 6. The corresponding p is between .250 and .100. There is insuf- 
ficient reason to conclude that the population is nonnormal. 

5. The inconsistency is accounted for by the fact that chi square and the g 
statistics test for normality in different ways. The latter is the better 
test, and the assumption of population normality is greatly in doubt. 
(See pp. 301-302.) 


6. With X? = 248and n = 11, P > -995. In random sampling, a fit this 
close would occur fewer than 5 times in 1,000. (See p. 293.) 


10. X? = 8.12, n = 3, 050 > P 7 .025. The differences between methods 
are significant at the 5% level. 

11. In testing for differences between compositions (columns) we have 
x2 = 15.7, n = 6, and .025 > P > 010. The differences between 
compositions are significant at the 5% level; in fact, they are signif- 
icant at the 240% level, In testing for differences between subjects (rows), 
we have x? = 37.9, n = 8, P < .005. The differences between subjects 
are significant at the 5%, level, even at the .5% level. 


P values and 


Absolute value, 69 
Acceptance, region of, 226, 238, 279 
Accuracy of approximate numbers, 20 
Aiken, H. H., 309 
Alienation, coefficient of, 156 
Alpha error of inference, 225 
Alpha measures of skewness and 
_ kurtosis, 85n 
American Psychological Association, 
188, 309 
Analogy, 11, 254 
Analysis of variance, 303 
Arbitrary origin, 53 
Area and frequency (see Frequency 
polygon, Histogram, Normal 
,. curve) 
Arithmetic mean, 50 ff. 
computation 
combined groups, 55 
correlation table, 138 
grouped data, 51-54 
ungrouped data, 51 
and confidence interval for population 
mean, 239-241, 249-250, 280 
and effects of 
constant errors, 199 
errors of grouping, 52 
errors of measurement, 55, 199, 
254-255 


relation to median in skewed 
distribution, 83 
sampling distribution, 236-238 
standard error, 237, 248 
population finite, 253 
uses, 54-57 
weighted, 55 
Association (see Correlation) 
Attenuation, correction for, 199 
Attribute, 16 
correlation of, 165, 299 
Authority, in problem solving, l 
Average, 44 ff. 
limitations, 57, 63-64, 80 
misuses, 12, 80 
uses, 57-58 . 
(see also Arithmetic mean, Median, 
Mode) 
Average deviation, 69 fT. 
computation, 69-70 
relation to quartile and standard 
deviations, 81 
standard error, 248 
uses and limitations, 70-71 


Bartlett, M. S., 309 
Berkson, J., 309 

Beta coefficients, 171 

Beta error of inference, 225 


349 


350 / Index 


Beta measures of skewness and kurtosis, 
85n 
Bias (see Errors, constant) 
Bimodality 
importance, 46-47 
Binomial population, 241 ` 
Binomial sampling distribution, 242 
approximation by normal distribution, 
242 


Biserial correlation, 163 ff., 207 ff. 
normalized biserial coefficient rb, 165 
Flanagan's approximation, 207 
point biserial coefficient rps, 164-165 
significance, 281 
Bivariate data, 127 
Boneau, C. A., 309 
Brigham, C. C., 309 


Causation and correlation, 13, 
160-161 
Central limit theorem, 252 
Central tendency, measures of, 42 ff. 
(see also Average) 
Chi-Square sampling distribution, 
289 ff. 


assumptions underlying, 293-294 
and correction for continuity, 295, 
298, 302 
curves, 292 
degrees of freedom, 292, 294, 295, 
8, 302 
development, experimental, 290 
precautions in using, 302-303 
table, of 327 
use of table, 292-293 
uses 
in frequency data, 293-302 
in ranked data, 303-307 
and z ratio, 295 
Class interval (see Frequency 
distribution, terminology) 
Cochran, W. G., 302, 309 
Comparable scores (see Scores, 
comparable) 
Confidence band 
for predicted Score, 149-150, 174 
for true score, 197 
Confidence interval, 230-231 
correlation coefficient, 246 
difference between means, 257, 260, 
266, 285, 286 
difference between proportions, 259, 
261 
mean, 239-241, 249-250, 280 
proportion, 243, 250 
other statistics, 247 
and tests of significance, 231, 241, 
259-260 
Confidence limits, 230-231 
(see also Confidence interval) 


Contingency correlation, 299-300 
coefficient of, 299 
interpretation, 300 
significance, 300 
Contingency table, 296 
Continuous series, 15 
Correlation, 126 ff. 
and causation, 13, 160-161 
importance, 127-129 
meaning, 139-140 
(see also Biserial, Contingency, 
Fourfold, Multiple, Partial, 
Rank difference correlation, and 
Correlation, product-moment 
Coefficient of) 
Correlation, net, 167, 170 : 
Correlation, product-moment coefficient 
of, 129 ff. 
assumptions in, 141-142 
attenuation, correction for, 199 
computation 
grouped data, 135-139 
ungrouped data, 129-135 
factors affecting 
errors of measurement, 199-200 
variability, 156-158 
meaning and interpretation, 153-158 
relation to regression coefficients, 144 


sampling distribution, 244-245 
significance, 281 


zr transformation 
in estimation, 246 
in testing hypotheses, 246 
Correlation, product-moment 
coefficients of 


significance of difference between, 
259, 282 


Correlation table, 135-137 

Criterion variable, 146 

Critical ratio, 239n 

Critical region (see Rejection, region of) 

Cumulative frequency curve, 32-34 
umulative percentage curve, 96-99 


Davidoff, M. D., 309 
Decile deviation, 68 
Standard error, 248 
Deciles, 66 
Degrees of freedom, 275-277 (see also 
il Square and t sampling 
distributions) 
Determination, coefficient of, 154 
illerences, four explanations, 232-233 
Difficulty, test item, 204-205 
Discrete series, 16 
Discrimination, test item, 205-209 
Dispersion (see Variability) 
Distribution (see Frequency 
distribution, Sampling 
distribution) 


Distribution-free statistics (see 
Nonparametric statistics) 
Duncan, A. J., 310 


Educational Testing Service, 102, 208 
Eells, K., 309 
Efficient statistic, 230 
Eisenhart, C., 309 
Equivalence, coefficient of, 188 
Errors 
constant, 182, 198, 200 
distribution of, 35 
of estimate (see Standard error of 
estimate) 
of grouping, 52 
of inference, 225 


of measurement or observation, 182, 


188-190, 216 
correlated, 200-201 
effect on correlation coefficient, 
199-200 
effect on mean, 55, 199, 254-255 
effect on standard deviation, 199 
effect on statistical inference, 
254-255 
homoscedasticity of, 197 
and interpretation of test scores, 
196 IT. 
and reliability coefficient, 191 fT. 
(see also Standard error of 
measurement) 


sampling, 13, 182, 216 ff. 


Types I and II in testing hypotheses, 
2 


Estimation of parameters 

interval, 230-231 

point, 230 

and sample size, 231, 249-251, 

260-261 

(see also Confidence interval) 
Evidence 

conditions of trustworthy, 181 ff. 

in problem solving, 1 


F sampling distribution, 303 
Fact, In, 3 
Fisher, R. A., 4, 5, 6, 128, 129, 244, 
274, 283, 309, 325, 326 
Flanagan, J. C., 207, 208, 309 
Fourfold correlation, 165 ff. 
point coefficient, 165-166 
. Significance, 298 
in test item analysis, 209-210 
tetrachoric coefficient, 167 
Fourfold table, 166, 298-299 
Frequency curve, cumulative, 32-34 
requency distribution, 24 ff. 
characteristics, 43 
construction, 28-29 
of continuous series, 26-28 


Index / 351 


descriptive constants, 87 
of discrete series, 25 
graphical representation, 29-34 
terminology 
class interval, 27 
class midpoint, 28 
indicated class limits, 28 
real class limits, 28 
two-way, 135 
types 
bimodal, 46 
leptokurtic, 38 
normal, 35-37, 106 ff. 
platykurtic, 38 
skewed, 37-38 
other, 38-39 
Frequency polygon, 31-32 
area and frequency relationships in, 31 
construction, 34 
Friedman, M., 305, 309 


Galton, Francis, 4, 22, 35, 158, 309 
Geometric mean, 58 
Goheen, H. W., 309 
Goodness of fit, test for, 251, 300-302 
Gosset, W. S. (see Student”) 
Grouping data 
assumptions in, 47, 51-52, 136 
errors of, 52 
purposes, 24 
Guilford, J. P., 309 


H test, 304-305 
Harmonic mean, 58 
Hastay, M. W., 309 
Histogram, 30-31 . m 
area and frequency relationships in, 
1 


construction, 34 
Homoscedasticity 
assumption of, 148, 157, 197 
meaning, 148 
Hotelling, H., 282, 310 
Huck, F. T., 310 
Huff, D., 310 : 
Hypothesis, alternative, 228 
one-sided, 229 
two-sided, 229 
Hypothesis, null, 222 ff. 
acceptance of, 223 
and Type II error, 225, 227 ff. 
rejection of, 223 
and Type 1 error, 225, 226-227 
testin, 
and confidente intervals, 231, 
259-260 
controlling risk of error, 226-230 
effect of sample size, 229 
effect of statistic used, 229-230 
level of significance, 223 


352 / Index 


mistakes or errors ih, 225 
ower of a test, 
sion of acceptance, 226 ff. 
region of rejection, 226 ff. 
subjective features in, 224, 227, 293 
(see also Chi square, Normal and £ 
sampling distributions, uses) 


Inertia, in problem solving, 1 
Inference, Statistical, 6-8, 216 ff. 
effects of errors of measurment, 
254-255 . 
mistakes or errors in, 225 
the two general problems, 218-219 
(see also Estimation, Hypothesis, 
testing) . 
Internal consistency, coefficient of, 
188 
Interval, class, 27 
Interval, confidence (see Confidence 
interval, Estimation) 
Intuition, in problem solving, 1 
Item analysis (see Test item analysis) 
Items, statistical, 16 


Johnson, P. O., 4, 310 


Kelley, T. L., 142, 310 

Kendall, M. G., 286, 310, 311 

Knowledge, sources of, 1 

Kruskal, W. H., 304, 310 

Kuder, G. F., 210, 310 

Kuder-Richardson estimates of test 

reliability, 210 

Kuebler, M., 310 

Kurtosis, 38, 82 ff. 
importance, 87 
measures of, 84-87 


Least squares, method of, 143, 171 

Leptokurtosis, 38 

Lev, J., 302, 311 

Level of significance, 223 

Likelihood, maximum, 230 

Lindquist, E. F., 310 

Linearity of regression 
assumption of, 148, 175 
meaning, 141, 142 


Mann-Whitney U test (see Wilcoxon 
T test) 


McCall, W. A., 121, 310 
Mean (see Arithmetic mean) 


Mean deviation (see Average deviation) 
Median, 47 ff. 


computation 
grouped data 47-50 
ungrouped data, 47 
relation to mean in skewed 
distribution, 83 


standard error, 248 
uses and limitations, 50, 57 
Mesokurtosis, 38 
Misuses of statistics, 12-14 
Mode, 45 ff. 
uses and limitations, 46-47, 57 
Moments, statistical 
computation, 85-86 
defined, 84 
use in measuring skewness and 
kurtosis, 84 ff. 
Mueller, C. G., 252, 310 
Multiple correlation, coefficient of, 
173 


interpretation, 174 ff. 

Multiple regression, 170 ff. 
assumptions in, 175 
equation, 171 
partial regression coefficients, 172 
standard error of estimate, 173-174 
uses, 174-175 


Natrella, Mary G., 310 
Nonnormality 
effects of, 251-253 
Nonparametric methods, 253, 262, 270, 
303 


Normal curve, 106 ff. 
fitting to given distribution, 111-114 
as limiting form, 106-107 
standard or unit form 
areas and frequencies in, 108-111 
table of areas, 323 
uses 


in educational measurements, 
114-122 
in statistical inference (see Normal 
sampling distribution) 
Normal sampling distribution, 235 ff. 
assumptions in, 251 
limitations, 249 
uses 
inferences from 
correlation coefficient, 244-247 
mean, 238-241 
Proportion, 241-244 
statistics, generally, 247-249 
testing significance of difference 
etween 
correlation coefficients, 259 
distributions, 262-264, 268-270 
means, 256-257, 264-266 
Proportions, 257-259, 266-268 
Standard deviations, 259 
Statistics, generally, 259 
Normality 
assumption of, 39, 87, 251 ff. 
tests for, 251, 300-302 
Normalizing data, 116-122, 252 
ull hypothesis (see Hypothesis, null) 


Observations, 16 

Open-end distribution, 50 
Organization of statistical data, 24 fT. 
Otis, A. S., 194, 310 


Paired data, 263-264, 266 
Parameter, 218n (see also Estimation of 
parameters) 
Partial correlation, 167-170 
assumptions in, 170 
coefficient of, 168 
significance, 247 
standard error, 248 
as correlation of residuals, 168-169 
and experimental control, 170 
use and interpretation, 170 
Pearson, E. S., 251, 324 
Pearson, Karl, 4, 129 
Peatman, J. G., 328 
Percentage curve, cumulative, 96-97 
uses 
comparing distributions, 97-99 
determining percentiles and 
percentile ranks, 97 
Percentile measures of variability, 65-69 
Percentile rank, 91-92 
determination 
by computation, 93 
from normal areas, 116 
. from percentile curve, 97 
in ordered data, 93-94 
and standard scores, 101, 115-116 
use and limitations, 94-96 
Percentile score (see Percentile rank) 
Percentiles, 66 
determination 
by computation, 66 
from normal curve, 115-116 
from cumulative percentage curve, 
97-98 
Peters, C. C., 310 
Phenomenon, 15 
Platykurtosis, 38 
Population, statistical, 8 fT. 
binomial, 241 
distribution, 220 
finite, 7, 253 
infinite, 7, 253 
stratified, 10 
two-fold, 241 
Power of a statistical test, 230 
Prediction, statistical, 147, 171-172 
accuracy of, 148-151 
assumptions and conditions of, 152 
limitations, 151—152, 175 
Probability, 219-220 
and inference, 223, 224 
intuitive interpretation, 224n 
Proportion, 17 
as arithmetic mean, 58-59 


Index / 353 


and confidence interval for 
population proportion, 243-244 

sample size needed for specified 
reliability, 250 

Proportions 

combining, 58-59, 258 

confidence interval for true difference 
between, 259, 261 

significance of difference between, 
257-259, 266-268 


Qualitative series, 16 
and attributes, 16 
central tendency, 58-59 
combining, 118-119 
transforming, 116-118 
Quantitative series, 15 
characteristics, 42 fT. 
continuous, 15 
discrete, 15 
normalizing, 119-121 
Quartile deviation, 67-68 
relation to average and standard 
deviations, 81 
standard error, 248 
uses and limitations, 68, 82 
Quartiles, 66 
standard error, 248 
Questionnaires, reliability of, 166-177, 
204, 209 
Quetelet, A., 4 


Random numbers, table of, 328 
use of table, 9 
Range, 65 
interpercentile, 65 ff. 
semi-interquartile, 67 
Rank difference correlation, 161 ff. 
coefficient ra, 162 
significance, 281 
uses, 162-163 
Ranking data, 94, 122, 161 
Reciprocals, table of, 339 
Regression, linear, 142 fT. 
coefficients of, 144, 145, 171 
standard errors, 248 
significance, 247 
lines, equations of, 144, 171 
tendency of “law,” 158-160 
uses, 146, 153 
(see also Multiple regression, 
Prediction, statistical) 
Reichmann, W. J., 310 
Rejection, region of, 226, 238, 279 
Relative frequency, 17, 110-111, 220 
Reliability of evidence, 181-182, 185 ff. 
Reliability of statistics, 186, 231—232 
and sample size, 232, 249-255, 
260-261 
and significance, 232 


354 / Index 


Reliability of test scores 
coefficient of, 187-188 
effect of range of talent, 198-199 
as ratio of true score and obtained 
Score variances, 192 
and standard error of measurement, 
191-193 
interpretation 
data needed in, C ae 
estimated vs real, 
in light of use of scores, 198-201 
methods of estimating 
advantages and limitations, 201-202 
assumptions underlying, 195 
half tests, 187-188 
Kuder-Richardson, 210-211 
parallel forms, 187-188 
test-retest, 187-188 . 
Representative sample, selection of, 
Research and statistics, 3 ff., 218-219 
Residuals, 146, 168, 169 
Richardson, M. W., 210, 310 
Roberts, H. V., 10, 160, 252, 253, 311 
Rounding numbers, rules for, 19-20 
Rulon, P. J., 195, 310 


5, as standard deviation of sample, 72 
s’, as estimate of standard deviation of 
Population, 274, 283 
Sample 
cluster, 10 
distribution, 220 
nonrandom, 10, 253-254 
generalizing from, 11, 254 
limitations, 11 
random, 9 
advantages, 10, 217 
methods of selecting, 8-9 
simple, 10, 217 
stratified, 10 
size, 232, 237, 249, 251, 260-261, 286 
Sampling distribution 
experimental, 221 
in inference, 222 
meaning, 220-222 
(see also Chi square, Normal, and ; 
sampling distributions) 
Sampling error, 13, 182, 216 ff. 
Scale of scores, 16 
Scatter (see Variability) 
Schafer, R., 329 
Score 
observed or obtained, 189 
percentile (see Percentile rank) 
standard (see Standard scores) 
stanine, 115 
T, 121-122 
true, 189 
z, 99 


Z, 102, 121-122 
Scores, 16 
comparable, 91, 101-102 
scale of, 16 
Sign test, 266-268 
Significance, statistical, 224, 231-232, 
256 


and reliability, 232 
tests of, 231-232 
Skewness, 37-38, 82 ff. 
importance, 87 
measures of, 84-87 
Smith, J. G., 310 
Spearman-Brown formula, 187 
Square and square roots, 
tables of, 330, 339 
Stability, coefficient of, 188 
Standard deviation, 71 ff. 
computation 
combined groups, 77 
correlation table, 138 
grouped data, 73-76 
ungrouped data, 71-73 
and effects of 
constant errors, 199 
errors of measurement, 199, 201 
population estimate, 72, 238, 274 
relation to average and quartile 
deviations, 81 
standard error, 248 
uses and limitations, 78-80, 81-82 
Standard deviations 
combining, 77 
significance of difference between, 
5 


Standard error, 237-238, 248 E 
as standard deviation of sampling 
distribution, 237 
Standard error of à difference between 
independent sample 


correlation coefficients, 259 
means, 256 


Proportions, 259 
standard deviations, 259 
other statistics, 259 
related sample 
means, 264 
other statistics, 268 
Standard error of estimate, 145-146, 
173-174 
and accuracy of prediction, 148-151, 
173-174 
Standard error of measurement, 191 ff. 
methods of estimating, 193-195 
and reliability Coefficient, 196 


use in interpreting an observed score, 
196-198 


Standard Scores, 99 ff, 
interpretation and use, 100-101 
and percentile ranks, 101 


in product-moment correlation, 
133-134 
transformations, 101-102 
Stanine, 115 
Statistics, 5 
Statistical data, 14 fT. 
Statistical methods 
broad uses, 4 
criticism, 3 
necessity 
in inference, 6, 216 ff. 
in reduction of data, 5, 42 
origin, 4 
Statistical series, 15 
continuous, 15 
discrete, 16 
qualitative, 16 
quantitative, 15 
Statistics 
descriptive, 7 
misuses, 12-13 
and research, 3 
sampling, 7 
three meanings of, 4-5 
"Student," 274, 310 
Sum of squares, 72 


t ratio, 274 
and z ratio, 280 
t sampling distribution, 274 ff. 
curves, 278 
degrees of freedom, 275-277 
development, experimental, 274-275 
table of, 326 
use of table, 278 
uses 
inferences from mean, 279-280 
1 test of difference between 
independent sample means, 
283-285 
related coefficients of correlation, 
282-283 
related sample means, 285-286 
t test of significance of 
point-biserial correlation 
coefficient, 281 A 
product-moment correlation 
coefficient, 281 . 
rank-difference correlation 
coefficient, 281 
T score, 121 
t test (see 1 sampling distribution, uses) 
Tate, M. W., 311 
Test-item analysis, 204 ff. 
and test improvement, 212 
test item 
difficulty, 204-205 


Index / 355 


discrimination, 205-209 
intercorrelation, 209-210 
validity, 211-212 
variance, 205 
Tests of significance, 224, 231-232 
Tetrachoric correlation (see Fourfold 
correlation) 
Thompson, Catherine, 327 
Thorndike, R. L., 160, 311 
Transformations of 
nonnormal data, 119-121, 252 
qualitative data, 116-118 
scores, 91 ff. 
True score, 189 
2 x 2-fold table (see Fourfold table) 


Validity, 183 ff. 

coefficient, 185 

experimental, 184 

formal, 183 

test item, 211-212 
Van Voorhis, W. R., 310 
Variable, 15, 16 

criterion, 146, 152 

dependent, 145, 146 

independent, 145, 146 

predictor, 146, 152 

qualitative, 16 

quantitative, 16 
Variability 

and coefficient of correlation, 

156-158 
meaning and importance, 63-65 
measures of, interpretation and use, 
80-82 

and reliability coefficient, 198-199 
Variance, 72 

explained and unexplained, 154, 173 
Variance error of estimate, 153 
Variate, 15 


Walker, Helen, 18, 218, 302, 311 

Wallis, W. A., 10, 160, 252, 253, 304, 
309, 310, 311 

Weighted mean, 55 

Wilcoxon, F., 262, 270, 311 

Wilcoxon T test, 262 


Yates, F., 326 

Yates' correction for continuity, 295, 
299, 302 

Yule, G. U., 286, 311 


z score, Z score (see Score) 

zr transformation of rzy, 244-246 
standard error, 248 
table of, 325 


> 


y ` `í. e 
* : N ` 
> ^ k. ` 
... os 
” . , r » 
- - 5 ° 
- . - E 
LI - 
* - hal < P * anm 
: had - ` - 
- Be as "39 ç 
H 
m 
“ € 
* E 
* " : 
. + 
. 3 * è H 
^ ` » - š - dá 
» 
P e a. 
* ka ` E > 
< # P v -— . 


